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ABSTRACT 

''V 

Tliiri  thcMs  inver.lififllcG  (lie  inr nr-iif ninrnt  .iikI  oniilyi.ir,  uf  c rpinpiilfM’  f.y'li'iiir. 

.tI  Ihr  f)(  h.ifiJvv.ii «?  prchilccluif.’  and  oppialin"  ty'.lon  l.ivnn  !  drv.ii'.n.  (,iolh 

Irvrl'  piovi<^f!  llip  ii'.ri  with  a  sot  of  instuK lion;;  liio  ih.h  lunf  instruction;-  and  tlio 

r7 .  ,  ti-. .  y 

/s' 

open  iilin^  syslt'm  I'.rTiicI  functions.  Our  vicnvpcunt  is  alway;.  ttial  ot  tho  ciosigners 
at Is-ii.ptira;  to  stucty  tlio  u:.af,e  of  thc^sc*  al>'.tract  iiv  lruc.lions  by  inr. asui  ii';',  ttic* 
Ijfiha'iior  of  actual  prnor ains.  VVC^tahe  the?  ramt  approatli  for  tlic  study  of  tho  oftocts 
of  nii.lt i()focessin';  at  fhncp  lovels.  v:  “ 

Tor  the  liaritwnif?  aicliitoct,  the  variability  in  Ibe;  wort.load  lias  alway;.  I^con  a 
diflic  lilt  dcsi(!.n  prol.ilrrn.  If  seems  intuit ivcly  dear  that  dific'rrrnt  apfiliealion  areas  ( 
scirntifit,  business,  process  control  )  present  dificrent  worl'.lo.idr.  to  a  processor.  I  lie 
iinpoftant  questions  faced  by  the  desii’tior  in  tliis  re-.iieil  are;  are  the  aiiplir alion 
aieas  diflc.'rent  at  the  lowcsf  Icvc-I  of  data  slruclure  rnanipulalions  i.e.  Ihr  instruclion 
mix  If  so,  are  they  sufficiently  different  .to  justify  a  specialized  procc^ssor  for 

each  application  area?  Ikiv.'  mudi  pc-rlormaiice  improve; mciiit  can  be;  obtaineil  by  such 
s|>Cfci.ili;»<*d  processors  across  all  Ihe  programs  in  a  given  area?  We  apply  statistical 
exptM  imenlal  design  techniques  to  quantify  Ihe  variance  in  Ihe  instruclion  mix  due  to 
the  various  laclorr.  in  order  lo  answer  these  qocstiems.  Our  results  indicate-  llial  the 
variance  dtic  lo  different  programs  within  an  area  is  comparable  lo  tho  variance  across 


appiic  ation  areas  themselves.  This  shows  that  Ihe  clille'ences  across  the  application 
areas  are  not  significant  at  Ihe  instruction  mix  level.  Tlii.s  is  a  consrqucncc:  o(  llio  fact 
that  Ihe  machine;  insirurlions  opc’ratc  on  hits  and  words  wIi'mims  Ihe  oiioratiens  on 


Iiiljlir  r  |f;v(:l  (int.T  r.lriK  liii  if-;.  mkIi  as  VKclors,  proco'^f.  control  blocl;:.,  ffucuor.  ;incl 
riitfr  i cntintf  l)rl Vtic-on  thr  aiiplir.alion  arnar.. 

In  llio  ( ar-c;  of  riiiillipioccTSCorE,  Iho  r.tudy  of  Iho  cont'-nlion  for  c;liat  f;cl  rosoiir  r  cr. 
amcirc  proccrr.oru  ir.  v<;ry  iiuporlanl.  At  the  hardwaiP  arcliilf'cturo  lf?v'el,  thr  contcjiiion 
occurs  for  llie  r.tiaioct  niciviory  and  r.harrd  data  paths.  This  pinhlcm  has  hern  r.liidind 
rarlirr  by  olhcMs  (  [ni-lAM,'3],  [MCCR73],  [l3Ar>KVG])  usiiic;  annlylical  rnodf-ls.  \'Jv  have 
at Iciiiplt'd  to  rnoasiirr?  the  ivirtvicry  contention  for  C.inrcip  -  the  Carnenic-Mollcn 
University's  tniilti  minipi  nec'ssor.  Oor  study  war.  Iiamporrci  by  IIk  lad;  of  hiph 
I  i?'.C)lii|ion  rneasut emr-nt  tools. 

Th.;-  measui oinent  of  Iho  woiT;load  aiKt  ils  variation  for  Iho  oporatinf,  sysk'in  Kernel 
li:  wl  is  cofnplic.alcMl  l.iy  llio  fart  IImI  r.Kh  oiicrahn;-;  sysic m  KorncT  has  its  own 
nd  of  primitive  functions  and  comparisons  across  difleront  operating,  syctems  ir.  riot 
possible  with  our  current  understanding  of  operating  systems.  Wc  liave  Ihcrcrfore 
cfccided  to  defer  the  general  study  of  (his  problern.  However,  one  aepact  of  flic 
rroblom  that  (.an  bo  attacked  is  the  problem  of  software  lockout  in  a  mulliprocc;sr.c>r 
c  aerating  syslc'm. 

In  order  to  maintain  integrity  in  a  multiprocc'ssor  syslem,  cr'rlair  sbnri’ci  data 
ebjocis  (such  as  the  list  of  runnable  firocesses  or  ttie  list  of  free  blocks  of  memory) 
have?  to  be  accessed  by  only  oner  processor  at  a  lime  giving  rise  to  softv/aro  locl.out. 
\i  .'hcjM  two  or  more  processors  allempi  to  access  the  same  shared  data  object  at  the 
s  .ame  lime,  only  one  of  Itiern  can  access  it  and  others  liave  to  wail.  Ttie  mechanism 
L'  -ecI  for  such  mutual  exclusion  is  called  a  lock.  The  time  lost  by  a  processor  wlule 
'•  ailing  for  a  shnied  object  to  become  free  can  become  a  prrform.'iiuc  bolllencci;  for 
r  lulliprocc'ssors.  A  hardware  monitor  war.  used  to  measure  the  contention  occurirg  in 


iii 

I  Ihc  opcrnli'.t;  r,y;;U  ivi  (Of  C.iiimp.  The  mnar.utciru.iit'^  '..how  thdl  while  iii  the 

op(?r,-ilinf’  r.yc-.lefn,  .jiioiil  ^100  iii'.lruclions  arn  cxcciilc'd  Ix'tA’crn  mk cc'.i.i'.c'  loci.r.  and 
r.irh  lockc.xl  oxcciilion  l.iUr,',.  nhout  100  inc.lriitlioiis.  I■l^lw(•:vc;l•,  ll’ir  r.haif’cl  data  of  (l\'di  a 
ir.  ori’,ani^cd  into  rnoup.h  '.epar.ate  object?,  that  vc'fy  lilllo  Imif;  wa;.  lost  dnf  lo 
c.ontofition  (or  tlin-.c  object?,.  To  the  best  o(  Out  knov.Ti.ri^M;-  an  c'>.'|ic:t  iiiic  ntal 
iiivef.lio.ation  of  thi;.  problem  was  tiot  possible  in  Ihi?  p.:i?.l.  A  '.iniple  cential  ?.('i''cr 
'.yc.lrm  Wo:,  i.ii.ed  to  model  the  locking  behavior  and  lo  p'^ediel  Ihr  limr;  lo'l 
due  (c)  contention.  I  lio  prjidirtionr-  were  validatecl  against  Ihr-  acliial  mefTsnt  enienlr  aiid 
Die  validated  model  was  Ihon  ii';ed  to  predict  tii«e  lost  due  to  contention  in  larp,er 
:iy;.lc;mr..  Our  mod',  i  predicts  that  time  lost  due  lo  software  locl.out  will  not  be: 
sip,nifir.ant  for  Hydi rve^n  when  the  number  of  processor.^  it?  C.mmp  is  il^c. t  eased  lo  'll?.. 
As  other  mulliprocc  ■  r.c.v  opc:ratin3  systems  are  dcr.i(’nrci,  Ihe  model  devcdopc-d  in  oitr 
iftvestigalion  can  be  used  as  a  p.uido  for  the  study  of  their  ?;oftwafe  lockout  pioblems. 

Ihe  ‘.pare  of  pf  foi  •ii.‘;nce  parameters  at  various  system  levels  it;  evamined  and 
stronoths  of  the  '•  aiicMis  measurement  tools  are  discussed  will)  respect  to  Ihe 
pariurieters.  We  also  idc  itify  tiie  hardware  inonifor  an  the  prim.xy  mr-iisiirt>mf;nt  tool  at 
tlic;  l«:\'elr.  of  harcfwr-re  archilocture  and  operatins  syedem  kernel  dnsi(;n.  Rc’cause  the 
hardware  monitor  i;-  a  versatile  tool  applicable  to  other  levels  as  well,  we  have 
invGstipated  many  ollnrr  applicalions  of  our  hardware  monitor,  but  to  lesser  depths.  A 
survey  of  hardware  :  ric -doring  techniques  is  also  presented  and  supi’cslions  are  m.sde 
rof'.arriing  the  design  of  'uture  monitors. 


IV 


A  NOTi;  (IN  Tt'.(?MJMOt.OnY 


W/j  have  Hit;  IcrininolORy  introduced  by  Bell  and  liJev/ell  [l3rLI.7J]  1o  det-c  ril:>(; 
conipulcr  :;lrucliirer-  thi oii{<hoiif  lliio  disncrlalion.  We  list  below  the  specifir  Irrinr.  ijr..rd 
alonp  wilh  their  incaninp: 

ISB;  The  logical  proce'^'.or  defined  by  iH--  int.lniclion  r.cl,  as  oppo'ird  lo  ils 
phyt.ir.al  implementation.  The  ISP  of  a  processor  inrii.idcs  s.iith  aspects  as 
instriittion  formats,  register  slriiclure,  instruction  interpi  elation 
ali»orilliiii'.,  address  calculation,  data  types  .and  their  representation. 

PMS:  This  is  the  Processor,  I, Memory  and  Swilcli  notation  (ft'velcped  by  Bell  and 
Newell  for  describing  cornpiiler  structures. 

K.mon:  Tliis  is  the  PMS  name  of  our  hardware  monitor.  Jt  stands  for  a  lontrol 
clciiicnt  (K)  for  monitorinj;  purposes  hence  KOnonilor)  or  K.mon  for  short. 

Host  or  P.tiost:  This  is  the  processor  under  measurement.  The  host  computc'r 
has  lo  be  a  PDP-IJ  in  the  current  dosiRn  of  K.mon.  K.mon  is  connectc'd  to 
the  Unibus  of  the  host  processor. 

Supervisor  or  P.;ajp:  This  is  the  processfjr  controlline  the  K.mon.  It  sots  up 
ref;-‘‘ors  in  K.mon,  starts  it  and  finally  piocesses  the  data  generated  by 
K.trc The  supc?rvisory  processor  is  also  constrained  lo  he  a  PDP-11  in 


the  CLirienl  clcisign  of  K.mon. 


V 


i IjiiiJw.irt;  Archil Thir.  ir.  the  part  of  haiifv/riir*  clc'c.iftn  which  cieals  v/i)h  the 
ISP  aiKl  il;.  impli-iMcntation. 

Opc^ralinR  Sytitcm  Kernel:  llAis  io  the  level  in  ll'c  operating  syclein  svherc  its 
morl  primitive  operations  are  defined.  A  hcrncl  incliidcc  (he  primitive? 
synchronization  mrchanir.m,  the  protection  mechanism,  the  low  level 
rosoince  allocation  and  scheduling  functions. 

Software  locKout  (  also  called  software  contention):  lliis  ir,  a  phenoiiienr>n  arising 
out  nf  the  need  lor  mutually  cixclusivc:  accesses  to  shared  data  sir  tie  lines 
in  a  miillipi ocessor  operating  system,  'ihc  softv.'are  IccUout  iii.sulis  iir  a 
loss  of  of  time  while  executing  certain  operating  system  (unctions. 
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1.  Introduction 


l. 1.  Tho  Need  for  Performance  Evaiuatton 

Iho  rffettiv'e  ittcar.urnmcnt  of  tomputcr  {.yi.temr.  ir,  ao  erseotial  part  of  tlie 
mfiasiii  rnuiiit,  modf^llil^2  and  opiimi.ration  cycle  of  computer  ty<.lemr..  Computer  cyctems 
havv.  evolved  into  very  complex  otructure?,  often  conciytiiiji  of  many  prncc'-.sins  unit?., 

m. '4ny  1/0  channek;  end  a  \yide  variety  of  high  performatice  I/O  devices.  0r»  the 
applic  otionr.  side,  the  computers  are  no  longer  operated  by  one  programmer  at  a  time 
executing  a  single  small  program.  Today’s  computers  are  reciuired  to  satisfy  a  wide 
array  C)f  demands  from  many  users  simullaneously.  The  users  alr.c  expect  the  compute  r 
systems  to  obey  (often  conflicting)  ronstr.iints  of  f.nst  response  and  high  compont?nt 
ulili;r.'iiion,  high  throughput  and  low  software  overhead.  It  ir.  no  accident  that  m.iny 
c.iirrnnt  computer  systems  arc  among  the  most  complex  m.4n--made  objecis. 

Iho  incroa.se  in  complexity  and  size  is  unforlunfilely  not  matched  by  an 
iinctni  stancting  of  the  dynamic  behavior  of  such  a  system.  This  has  given  rise  to  the 
science  <  or  art)  of  computer  performance  measurement  and  evaluation.  The  need  lo 
gain  an  understanding  of  the  dynamic  behavior  of  a  computer  syslem  is  fc'll  by  .^ll 
compuler  professionals,  wliclher  engaged  in  design  of  new  liardware  and  software 
systems  or  in  writing  efficient  applications  programs  or  in  running  a  barge  inslall.dion. 
Lucas  [LUCA71]  has  allcmpled  lo  partition  the  need  for  pcrform.incc  measurement  into 
three  areas:  design,  piirch.ise  and  optimization  studies. 

In  the  area  of  design  and  development  of  new  hardware  and  software  systems, 
mear.iirement  plays  a  vital  role  in  guiding  the  designers  to  make  optimal  decisions  with 
respe  ct  to  design  trade-offs.  Moreover,  rwisny  of  the  designs  have  lo  rely  on  analytical 
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or  siiiiiil.ition  model'’.  bt:'caii''.r  no  ope'rntionni  inpiiiiKC  of  Midi  r  ,  ^.U.ivir.  c>'i'’.l^.  IIcm  i;, 
inr  »nfii  II  riiiont'..  ;ir(!  ner-di'd  (o  ol,i|,nin  p.ii’ii motor:’,  for  Midi  modt:!:.  and  to  \'c:i’ify  the 
pretlii. tiijns  of  the  mr>dol;.. 

Till-  dt.'c isioiv,.  rc'i^ardinp  (lurch, i;.e  or  Icasinr;  of  a  hardwarp  or  :ci(l’.‘/arp  unit,  can 
alr.O  lienefil  from  performance  mear.LircmGnt.  The  prcblttm  I’icre  ic  to  compare  Hie 
pcrrform.'incc:  of  the  unit  or  sycilem  under  conciideralion  with  come  efandard  or  v.'ilh 
other  cyclcmr..  Tiincr;  operational  cyclerrir.  ,nin  avail.'iliU*,  in  innlia''.t  with  the  di  ’  iijn 
problem,  direct  ivicariurement  of  paramelcrr.  of  interest  ir.  por.-.iblr, 

I  h<:'  third  area  of  .application  of  porform.ance  nu; ar.iiremcnt  ir,  conccrnc'd  v.'ilh 
ovjtiiiii.Ting  the  operation;;  of  <n  ri|iecifir.  comfuiler  cyric-m.  The  hicarui  ements  ,ai  e  need 
to  prc'diet  (  and  ol’ircrv'c;)  the  effectr.  of  rmall  hatifware  or  co't'A'are  chanfy;r..  The 
chan<e<;  incliidi'  Mcin;;  a  factor  (  or  dower)  central  procor.r.c:r,  af.c;uirinr;  la:;tc‘r  or  larncr 
coconcl.ary  ctoraf'e  clevicer.,  addition  of  primary  mnniory,  altering  the  alloration  of 
deviter.  to  channelc,  altering  the  processor  scheduling  algorithm,  allering  the  paging 
and  oilier  algorilhms.  The  reason  such  optimi.'jalion  decisions  cannot  be  made  by  the 
manufarhirer  once  and  for  all  is  lhal  the  oplimuin  choire  riependf.  on  the  worl.load 
experienced  al  an  inslall.ition.  The  measuremenfs  have  to  he  perfo'incd  continuously 
and  the  system  has  to  be  altered  (perhaps  dynamically)  to  suil  the  changes  in  the 
worldoad. 

We  have  chosen  to  parlilioh  the  performance  evaluaticn  into  levels  such  that  a  levc-l 
is  c  haracterizod  by  the  number  of  machine  cycles  rcciuirecl  tor  a  typic.’il  operation  at 
that  level.  It  is  advantageous  to  do  so  since  the  performance  pai  .ameters, 
measurement  lools  and  Ihe  apfilicalions  of  the  evaluation  are  different  from  one  such 
level  to  another.  The  lowest  level  is  the  hardware  engineering  level  where  the 


or-criition^  ii>vol'/('  onr-  or  a  fcvw  matliine  cyder-.  The  olhr.r  levcir  in  (hr  mcir^irin^ 
orrier  an*  (he  hardwnrn  nrchiloc turn  level,  (ho  opor.itino  r.y'.lt'in  l.ririol  c(eri(',ri  Ic  'cl, 
the  '  yrtf'mr  pi  op,raiiiiiiiiic  level,  (he  applirnlionr  pi  opr  .imniinf;  Icv'cl  .vui  fin.illy  the 
Ijro.K)  iivtallafion  inanapcivicnt  level.  There  IcvcIr  ai  e  clif,ci.).rrcd  in  more  (ietml  m  the 
iic;<l  f  hapter.  It  ir  not  practical  lo  aticmpt  to  nu-crlipalc'  the  pei  (orinanc  c  arpc’clr  al  -ill 
tlicrc  IcvcIr  al  once  rincc  the  parameterr  and  the  tooir  n'C|uirL>d  are  widely  dlftc-i  ent. 
T  hi.r  ilj.rrr'i  t  at  inn,  tlirreforn  (ocurer  on  (he  study  of  the  performance  mr.  arm  (;mcnt  and 
analyris  at  the  liarcfware  architecture  and  the  operating  ryrtem  l;crnr:l  de‘.inn  le\'t;lr. 

1.2.  Ovorvic.'W  of  tltc  Diisorlnfion 

*  Tli'r  c  hapter  introduces  the  area  of  computer  pertoi  inane e  me arureiiient  and 
evaluation  and  dircurscr  ilr  applications.  The  nc>.t  chapter  prerentr  the;  pcvformancc 
par- irnr.tcrr  at  the  various  syrlern  levels  and  (he  mcaruieruent  tools  ap-plic  able  In  there 
lc''el;.  It  also  givers  (he  motivation  behind  (lie  rercarrti  piccrntcd  m  tiie  rest  of  tire 
dissc’ ( ation.  Cliapter'S  gives  a  brief  cfescriplion  of  our  hardware  mcnilor  K.mnn  and 
cii'-.cm.scs  its  rlrengthr  and  weal'.nesses  as  they  relate  to  the  mean.ii emrrnls  at  the 
hardware  ai  cl'iilectiii  e  and  the  Operating  system  Kernel  design  levels.  Chapter  ^ 
ch.'sc  riher  an  in-dc'ptli  f:yperimt  nt  conducted  lo  quantify  llic  variation  in  the;  difti.  rent 
appli(  ation  areas  in  terms  o(  tlieir  usage  of  the  PDP-)  1  instruction  st:t.  The  complete 
statistical  ch.’sign  listed  to  quantify  the  variation  is  prescnlecl.  Clia|.itpr  5  foruser  on 
another  major  enperiivient  relating  to  the  operating  systcni  kernel  design  levc;l  fc>r  a 
imiltiprocc'ssor  syslcmi.  Ihc  software  contention  arising  clue  lo  inuti.mlly  eve  lusivc; 
arces'.cc-  to  shared  (fata  structures  is  studied  for  Hydra  -  (he  operating  system  |■,clncl 
for  C.innip  (the  Carnegie-Wellon  multi-mini  processor  system).  A  siuiple  central  r.cr  -er 


Mioilt. I  !•.  prn'rnlcc)  to  '..tucfy  the  contoi^tion.  Thr  n.  tin? 

iiif.  .ir.n?  (’iiif: nt-:.  ^nd  then  ir.cd  to  fn'rdict  thr  forta:)!;!  tor  l.-..  j;i?r  C^.iiMiip  lil.c?  i,lriK  tin 
C'lmpior  r>  fji  rrcnt'.  thr  VrTnoiir.  rrprrirnf.'nt'^.  pri  ionrifitl  iiMiif.  (aii  h.ii  (Kv.'ii  r  monilor  ,it 
ott'irr  r.y'  loiYi  Ic  'cl',.  ^oivif;  rypt'i  iinr nt-..  jirr  iniiicii  hut  '..Omr;  (.10  brcoiiu;  .1  iii.iji'ir 
r(''..(;i(h  prnjrrt.  C.lwiptcr  7  c.uinnwnnzes  the  cli-'.oertiition  v.Mth  MKyj.c.tiorvc.  for  fufuir 
riv.  e  i  ( ii, 

Appf'iHfix  A  prt- '.rnt',  a  ruf  i'ey  of  h.ircfwpro  ll■(f;l1liol  inj;  tcchniquor..  A  c  oivif;  nrir.ou  of 
('vtsti''-',  f  oriinirrii.''t  aud  rip.rarch  liardwnri?  inouilor;',  ir.  prr'‘rnlcfi  alnnp!  rnaiiy 
cinvic  iir.ionr..  I’rLMTct'.-  in  tlic  h.u  iiwr.rp  monilor  cic'/rlopment  art;  iiirntiticd.  Appendi';  B 
pn'CO  the  corriplrte  in^.lrucliOn  miy  data  which  inctudec.  a  f.ho'  l  (Ji'M  ri(:'l!on  of  f/  r 
iiin «ir;i  11  r d  pro^’ramr.,  Apprncficcc  C  throup.h  h  pirornt  t!ir  onloiil;.  of  nui  ottirr 


rypi'i  imrntc.  111  cirtail. 
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2.  Peiform!?.nc9  P&i'.-imeters  rsnd  Mecsuicirni-'nt  Tools 

Ciupter  1  ciiv.f. u;.r,er.  tlic  i’O.ilr.  o<  performaiK.e  evali.iation  .tiHi  it)e.ir.ur(Miu;nt.  To 
acluc-'c  rnny  of  ooe  liat  to  perform  oxpcrimeiif^  to  mmiiutc  inaoy  (iitfcr(?nt 

r|iianli|ir  ^  of  iiiti'i  (!'.l.  riir;  pm  pO';f  Of  tliir.  clmplcr  i;.  lo  irtenlify  Iho  pcrformatKC 
paraiiicter;-.  of  iiitoro'T  onct  lo  iior.crib«f  the  varii>iir.  incanot (nr.nnt  Ic)oIl  io  llio  conic/l  cif 
Ihccc  p.nraiiiC'tt'r;-..  Cven  lliouGb  our  main  interest  lies  in  tlio  area',  of  hardware 
arrhil ectLire  and  operatinG  system  kernel  dcsiftn,  we  will  present  the  pat  ameterr.  and 
tools  applicable  lo  other  areas  as  well. 

The  literature  on  performance  evaluation  and  measurement  conlaiiis  tn.any  instances 
of  inr  asiireriients  of  specific  parameters.  It  is  difticull  ,  if  not  impossil'vle  to  enumrralc- 
G'.'CM'v  parameter  measured  so  far.  Moreover,  Ihe  at rliilertm (.>$  of  exisliivj  hartJwait?/ 
software  systems  and  Hie  emr-rein'j  technologies  G'''^'  interest  in  many 

pal  aineturs.  For  evampic,  it  is  often  no  lonGcr  meaningful  lo  me  asm  u  ll.e  elapsed  lime 
lo  execute  a  prosram,  but  Ibis  v.'as  an  inlcreslii'G  parameter  bcTore  mull iproGramminG 
was  intrciduced.  Gimilarly,  Ihe  advent  of  limesharinG  has  Gi^ncralcd  interest  in  the 
rcspC’nse  tune  measurements;  and  Ihe  emergence  of  multi-processors  has  given  rise  an 
lo  intorest  in  pai  nmelcrs  like  memory  contention  and  ctfeclivcs  spc'cd-up  factor.  Any 
list  of  fierform.ancc  parameters  Itierefore  stands  lo  become  obsolete  as  teclmolosy 
cvol'/es.  Just  as  performance  parameters  depend  on  the  tc-cbnoloGy,  the  I'alues 
rnrasiired  for  Ihe'jC*  parameters  are  also  ctependmil  on  lecbnolOG/  (Ibal  is,  the 
c  bar.icteri'.tics  of  tlie  components  used  lo  conslrucl  llio  systems  and  the  way  in  wbicb 
these  components  are  intorconnocted  )  and  the  workload  present  on  Hie  machine  wlien 
the  inr.asiiti’iTitnts  are  made.  These  farts  should  bo  kept  in  mind  while  sliidyinG  tbo 


perform.-mce  pai  amelers. 
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2.1.  Clcisr.ificalion  of  Performance  Parameters 

V^ii  lour,  |•c«..ea^rlK!|•!■,  have  aMempted  to  tlasoify  the  porforin.'iuc  par.nvcterr.. 
S'/oboctova  [riVO(3’/6l:i]  lists  four  difforont  aspt.-cts  of  ev.'di/.ilion  p«ir.imc)i','rr.- 

1.  Oo'intily  of  work  a  syslMii  ran  liandle. 

2.  ability  of  a  system  to  fulfil  Ihn  users*  needs  (  qualily  of  service) 

3.  utili;ralion  of  the  system's  hardware  and  software  componeids 

d.  Internal  operation  and  cliaractcrisfics  of  the  sysiem's  liairi.vare  and  software 
components  (  ufiderlyinv  factors) 

'I he  first  two  are  measures  that  describj!  a  s'/slem  as  it  m-anifesls  ilself  lo  an  oxlrrnal 
obseiver.  The  last  two  describe  llie  internal  behavior  of  the  system. 

Fuller  lr>TOI')75]  noles  that  performance  measures  fall  into  two  fund.mir.nlal  classes: 
response  time  measurements  and  throughput  measurements.  Measuies  in  the  class  of 
respC'iisc  iiiue  inriude  llir  lime  taken  lo  respond  lo  users'  comm.ind':,  time  taken  to 
service  a  disk  request  and  turn-around  lime  in  a  batch  install.ation.  The  tkiss  of 
throughpul  measures  incincles  the  number  of  jobs  executed  per  ckiy  and  also  the 
u1ili:ritlion  of  the  various  components. 

Vv'e  have  choson  to  ckissify  the  performance  par.»mtters  as  belonninj;  lo  various 
Ic'vel'  in  a  coreputr'c  syslem.  This  classification  is  aciv.ant.ief.c5us  lierausc  we  believe 
lhal  many  measurement  tools  can  best  be  classified  as  'belonging’  lo  Ibo  same  levels. 
The  idc!a  of  considering  (he  p(;r(ormance  parameters  as  hedonging  lo  differeni  Ic'.'cls  is 
nol  new.  Kolanko  [KOLA??]  describes  a  scheme  of  considering  levc^ls  of  abcli.act 
machines,  r5ach  composed  of  stales  that  can  be  absiracircl  from  m.iny  slates  from  the 
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next  lower  level.  A  'tiilc  Itaneilion  al  level  h+)  (  ;i  Ir.ur.ilion  cfm  l>c  a 

>tif;  ar.i  It  (;iiiriit  pai  ameU.'i  )  i  i’flcdt.  any  one  o(  :.cveral  linneilionc,  at  level  h.  fj -olioflova 
[SVOIJ7(>h]  conf.lrlero  the  reQi.';ter  lran«.lci  ,  the  ISP,  the  '  oll'.vai  e  '.iippoi  l  and  llie  PWS 
as  iiii;  anin^kil  levels  (or  tlassdii  alion  o<  pei  (ovm.tiHe  pai  iiinetev;,.  Om  c lassilit  .dinn  ( 
S(’e  (iviiie  ?.l)  is  simil.ar,  tfrcepl  lhal  il  < Olv..il^er^.  the  Icx'ds  alon^',  the  'inacliine  cy<li:r' 
avis.  We  have  also  all'  niptc'd  In  c hat ar Uri;>e  the  iidi'iesl  oi  v'erions  (Onipulcr 
protf  .';ir)ii;il.':  .-ilono  the  same  axis.  Admifledly,  siKh  a  ( l.issilicalion  is  rathe  r  loose,  htil 
il  lines  point  out  hov;  dificrent  lOrnpwtci  protcssionals  view  the  pc*i-form,'tncc 
pat  aiviftcrs  and  measurement  tools. 


o'/sren 


1  f^^rform^ncc  P^ir  .itnrlcr 
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2.2.  Performance  Measurement  Tools 

(■'i{!ur<?  ?.l  point*,  out  Dial  tliere  an?  a  variety  of  pciform.4ncG  iiif;ar:ur(>mf;nt  tools  , 
with  c?ach  level  having;  a  prc’fered  measurerrient  tool.  For  an  inslall.?lion  in.inaner,  the 
goals  of  performance  mear.ureiv.ent  are  to  aid  the  purcliar.e  of  nev/  component-:  or 
syslernr.  and  to  direct  the  optimization  of  existing  resources.  To  acliieve  these  goals, 
two  m.-iin  measuremenls  have  to  be  performed:  Ihe  workload  at  the  inslall.^tion  has  to 
ho  c h.niiK lorizocJ  and  lh(?  utilization  of  various  hardware  components  nerds  to  ho 
measured.  The  system  accounting  log  give.s  Ihe  rcsounes  consumed  by  individual  te.er 
jobs  and  5:0  it  can  be  used  for  workload  characterization.  Moreover,  with  a  more 
sophisticated  log,  one  can  get  the  resource  utilization  on  a  pcn-second  basis.  The 
measi.irerrient  of  ov'c-m'I.^P  of  various  hanfware  components  however,  h.is  to  bo  obl.;)inod 
u?:ing  a  hardware  monitor  in  present  computer  systems. 

Tlie  r(?ason  a  soflVi-are  monitor  is  most  applicable  at  Ihe  applications  and  systems 
progran?rning  levels  is  two-fold.  First,  measurements  at  three  levels  nred  considerable 
amount  of  structured  information  in  the  form  of  various  .syc.lcm  gueiirs,  job  tables  and 
the  .lOGOcialion  of  high  level  language  stalementr.  with  actual  instiiiclions  bring 
executed.  Such  information  is  most  easily  obtainc?d  i)y  introducing  measurement  code 
in  tl'C  appropriate  routines’.  Second,  the  primitives  at  these  levels  lat;e  many 
thousand?;  of  machino  cycles  to  execute,  so  llie  ovc:rhc;ad  caiexd  by  instuTing 
measui  (>ment  code  is  not  prohiijilivc?. 

Ar.  one  proceed*;  to  lowccr  levels,  the  overload  r.-?ii*;rd  by  soflv.'.*ire  moniloring 
bc?comer.  significant  and  moreover,  the  high  level  information  needed  to  father  or 
inter|;r(!l  llio  measurc'menis  can  usually  be  compressed  into  a  fc'w  l^its  (  e.g.  v.-hrtlior  a 
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crrl;»in  iir.cr  ir>  cxcctiling,  oprr.iling  or  user  exrciilioii).  It  ir.  Ilifrcfoin  natural 

to  ii'.c  a  hybiid  monitor  wlioro  most  of  the  measurnments  are  performed  using  the 
liar<fv.'fir«?  monitor,  but  the  software  on  the  measured  sys.tcm  assists  by  siipfilying  the 
rcqiined  Itijjh  level  information. 

Fif^ally,  when  one  is  considering  events  taking  place  over  a  fev/  machine  cycles,  the 
only  lool  thal  can  be  used  is  a  hanfware  monitor.  The  machine  states  rc*quirecf  to  be 
monitored  at  Ibis  Icvc:^!  are  generally  not  accessible  via  soflware.  K/easur rrnonts  at  the 
cy  'e  le'.'cl  and  below  are  UMially  conducted  by  a  hardware  m.untcnance  eniynccr  ,nnc) 
they  are  uMially  performed  for  the  purpose  of  diao  jsis  rather  than  perform.nKf 
cvali.mtion.  We  will  not  be  concerned  with  measurcMiients  at  this  lc\'pl. 

Thi'  need  (or  menMiremcnts  af  the  hardware  bits  level  gaver  rise  to  the  liardwaie 
monitors.  In  (act.  most  of  the  early  studies  involving  hanfware  monitors  were 
nr.lriftrd  lo  Ihir.  Ir'/cl  [AFrCir,’},  BCW/},  Jf3K<f>3,  PUDIXV'’,  lAl:.ir/P].  Clever  Wriys  have- 
been  devijied  hC)W(>vor,  to  make  a  hardware  monitor  u;,cful  at  upper  levels.  A  siniple 
address  compaialor  ca'^  be  used  in  some  syslems  to  identify  that  a  particular  user  is 
executing  or  lhal  a  pnrhcular  operating  system  function  is  being  performed.  A  number 
of  performance  parameters  applicable  lo  higher  system  levels  can  he  measi.m^cl  using  a 
hardware  monitor  without  altering  the  software  on  the  measiircKf  syc.lem  at  all  (e.g.  the 
average  CPU  ufili^allon,  the  execution  profile  of  the  operating  sysic'm,  av'cragc  think 
time  and  compute;  time).  When  the  software  on  the  mr-nsuied  system  is  modified  to 
actively  assist  the  hanfware  monitor,  one  can  perform  mensuremc'nts  on  the  activity  of 
a  parlici.ilar  user  or  take  into  account  the  effects  of  program  overl.^/s  when  acquiring 
an  execution  profile  or  a  routine  Irace.  Such  a  hybrid  mnasun>iiinnf  technique  has  be  en 
found  lo  be  quite  useful  [  SV0B7ub,  HUGII?^,  COLL76,  SC(3A7<1] 


I  '':n  lfioi.ir;li  fi”iiro  2.1  ‘iliilos  that  the  mo;l  appliral^lp  tool  al  llie  Icn'cI  o(  the 
itl^lallc>tiol■*  manafic'i  and  ap()lic alioni.  programmer  ic.  a  r.oflv.Mio  tiionilcir,  we  find  that 
tuoA  commercial  hardware  monitors,  by  virtue  o(  the  software  supplied  with  theun  are 
gcai(  d  tov/ardr.  those  upper  levels.  Apart  from  there  being  more  mail.el  at  these 
level'.,  wp  .see  the  following  reasons: 

J.  Parameterr.  at  upper  Io-.tIs  are  composed  of  pararnetrrs  at  lower  levels  and 
tliesc  can  be  monitored  using  a  hardware  monitor.  The  device  ovc;rlap  aiuf 
drrvico  utilisation  certainly  need  a  liaidware  monitor  in  ciiriiMit  systems. 
Mon  ovc:r,  paramele'cs  liho  the  avc  r,»;y'  responscf  Ihur  aftuall'/  fonsist  of  lower 
lr?vc:l  paramr-lcrs  like  llic;  execution  of  command  inlerprelcr,  device  initiation  to 
luring  in  a  irew  program  if  required,  wail  for  Itie  device  to  complete  llu;  rcciucsl 
and  eyeculion  of  the  new  program.  Each  of  these  can  be  measured  using  a 
hybrid  or  a  h.ardware  monitor.  Even  when  such  mfiasurements  are  unable  to 
give  a  concrete  number  for  the  average  response  time,  they  indicate  why  the 
observed  response  lime  is  large,  so  corrective  action  can  be  lahen.  In  short, 
lower  le\'el  measurements  can  hot  only  yield  upper  level  parameters,  l>ut  ii\ 
some  cases  can  provide  valuable  insight. 

2.  Any  software  monitor  necessarily  implies  some  modification  of  existing  the 
operating  system  ,  systems  programs  or  usc-r  programs  (  a  sampling  sotiwaio 
monitor  is  an  exception,  but  its  applicability  and  accuiacy  is  limited  if  the 
samplirig  rate  is  kept  low  to  reduce  overhead).  Such  a  modification  perluiTjs  the 
systcn-i  and  more  importantly,  can  become  a  source  of  eirois.  Tins  gives  rise  to 
tlie  reluctance  in  gathering  information  with  a  software  monitor  if  the  same  can 


be  oht-dned  using  a  hardware  monitor. 
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3.  A  li;ir(jw.iro  monilor,  once  r.c*t  up,  cai>  p,iv(;  quid;  ,tn‘'.wpn.  lo  tn.iny  ()ur*.lionc,.  l  or 
ryainpin,  p.Ti  niriclcrr.  liKo  (he  avcirnp,e  (ni'if:  (ala  n  to  of.'culf  a  crr(;>iri  roulinr-  or 
(lir  ciir.lnbiition  o(  Ufic  of  clifftront  r.uporvi'.ory  '..orvit  c  calls  and  (he:  lime  (al.ci^ 
(o  c otiiplrdc;-  c-;ach  call  can  bo  oblaincd  williout  loo  much  cfforl.  Tciuivali'nl 
nir',i!i(it(:riir;nt  in  poflwat c;  woulcj  roquiii?  mon?  lime:  aiul  a  i>i(;rc;  del. iili  cl 
knowirclico  of  (he  software?  uncl<?r  cons'ijoralion. 

Z.3.  P(;S|  fonpi?nce  Paranveters  Constitulini’  Our  laludy 

Vi'r  ap  i>  intcrci'.tecl  irp  (hr  pen  formanc e  laciafajromenf  ai?cl  nvalnafinn  a(  (he;  h.pi  dwap  c? 
ai  rliifc'cfurc:  apicf  opc^rafipv^;  sycdcp'i  l.crnel  cfc  c-ipn  lic-vctc..  Ihc  t'  field;,  np  i;  tr>:(.'andip'j; 
f  .ipidly  wiih  (he*  aclvenf  of  (cchnology  aPKl  il  ir.  thorofon?  p'poI  appp'oprialo  lo  allc'P??pl  lo 
r'udy  all  anpeclc.  of  porfopppuppcce  polated  to  thoc.e  arc-a:.  pp\  clc;plh.  Wo  liavc-  fdipdpcd 
>  any  paramelcrr.  of  interest  in  these  areas  ap'Pd  have  cylendt?ci  our  study  to  include 
r  ullpproccsr.or  syslrmr;  as  wc-ll.  Oiir  viewpoint  is  always  Ih.sl  of  the  d(?spp,nrrs 
Icpnphng  lo  r.ludy  the  h.vriwfirn  arrhtlotfurc  and  the  oprralins  c./ctcii?  t.erncl  by 

r 

r  f.aspjrinj;  (he  behavior  of  the  existing  systems  undc?r  actipal  user  progpapp^s.  Fpgcppc  2.? 
c  spirys  the  pnajor  pampiujters  constituting  our  study. 
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rigure  2.2 

Performance  Parameterr.  Const  it  uting  our  fitudy 

Uniprocessor  systems: 
behavior  of  systems 
and  workload 
f  haraeferization 

Mill  1  i|)r oc c ssor  sy si c'lns : 
loiitonlion  ifue  to 
shared  resoun  es 

Hardware  archilGcture 
loved 

Ttic  instruction  mix 
and  quantification  of  its 
variability  with  respect 
to  application  areas 
(chapter  4) 

Memory  contention 
in  C.nimp 

(chapter  (>} 

Oper.iling  syslonn 
kerne  1  design  le\'<:l 

Study  of  system  behavior 
and  workload  chai  iHtei  izalion 
is  made  ditlicutl  by  non¬ 
uniformity  of  primitive 
(unctions  across 

Operating  systems 

nnftwaip  r onlcntion 
in  Hydra  (or  a 
noii-intrrnctive 
worklo,id 

(chapter  5) 

Kof  Ihfi  hfirdwaro  llie  v*irii»bility  in  tlw  w(»il.lo.ul  has  always  been  a 

clifliciM  probh'oi.  it  is  intuitively  deal  that  di((er(?nl  application  ainas  ( 

scienlitic,  business,  process  control  )  present  ctiflerent  worKIOiitJr  to  a  processor.  The 
impoftant  questions  faced  by  the  designer  in  this  respect  arc;  are  the  application 
areas  c)iffer(?nl  at  the  lowc.sf  level  of  data  structure  manipulations  i.r.  the  iiv.lruc tion 
mix  level?  If  so,  are  they  cuHiciently  different  to  justify  a  specialised  piocf  ssoi  for 
each  application  area?  l-lov/  much  performance  improvcrurnt  can  be  ol;>lainc’d  by  such 
spec i-ili;?ed  processors  across  all  the  progiams  in  a  vivcMn  area?  We  apply  statistical 
C'xpc  ?  iini-ntal  design  techniques  to  quantify  the  variance  in  the  instruction  mix  duo  to 
ttie  'irious  factors  in  order  t9  ansv/e.r  these  questions. 
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The  measure  me  nt  o(  Iho  workload  and  ils  variation  (or  the  operatinft  system  kernel 
dcsiRn  level  is  complicated  by  the  fact  thal  each  operating  system  kernel  has  ils  own 
set  of  primitive  functions  and  comparisons  across  different  operating  systems  is  not 
possible  with  our  current  understanding  of  operating  systems.  We  will  therefore  not 
study  parameters  at  this  level  (or  uniprocessors. 

In  the  case  of  multiprcKessors,  the  study  of  the  contention  for  shared  resources 
among  prncescorr.  in  very  important.  At  the  hardware  architecture  li;vt;l,  the  (ontention 
occurs  (or  the  shared  memory  and  shared  data  paths  if  any.  This  problem  has  been 
studied  earlier  by  ctirers  (  [BIIAN73],  [MCCR73],  fBASK7t>3)  using  analytical  models.  We 
have  attempted  to  measure  the  memory  contention  for  C.n>mp  -  the  Carncgic-Msllon 
University’s  multi  miniproccssor.  Our  study  was  hamperecl  by  the  l.ntk  of  high 
resolution  measurcmnnl  loots.  However,  the  contention  problem  al  the  kernel  design 
level  was  attacked  successfully.  The  contention  arises  because  in  order  to  maintain 
integrity  in  a  muHiprocessor  system,  certain  shared  d,ita  objects  (.such  as  the  list  of 
runnable  processes  or  the  lint  of  free  blocks  of  memory)  have  to  be  accessed  by  only 
one  processor  at  a  lime.  When  two  or  more  processors  atlcmpl  to  access  the  same 
shared  data  object  at  the  same  time,  only  one  of  them  can  access  it  and  olhers:  have  lo 
wait.  The  mechanism  used  (or  such  mutual  exclusion  is  called  a  lock.  The  lime  lost  by 
a  processor  while  wailing  for  a  shared  object  to  become  free  can  become  a 
performance  bottleneck  for  multiprocessors.  A  hardware  monitor  was  used  lo  measure 
the  locking  behavior  and  the  contenlion  occuring  in  Hydra,  the  op(?rating  syrdem  for 
C.mmp.  To  the  best  of  our  knowledge  an  experimental  investigation  of  this  problem 
was  not  possible  in  the  past.  Our  study  of  this  important  performance  parameter 
should  guide  the  study  of  this  problem  in  future  multi-processor  c>perating  systems. 


11  ir.  tlcai'  from  (lie  di:iCc»r.MOn  in  lh«  piRvioiis  scclion  Hint  a  haidwarn  monitor  i*;  Itip 
mof.l  appropriate  tool  to  investigate  these  par.imeterr..  I3iit  a  hardware  im'nitor  is  a 
vcrrr.atilc  tool  width  ran  also  he  applied  lo  pt'iformante  studin'.  ol  other  system  levels. 
Vk'e  have  therefore  expended  some  effort  in  studying  the  hardware  monitoring 
tr:>rhniqiies  in  general.  The  next  chapter  briefly  describes  our  harifware  mcnilor  K.mon 
and  in  chapter  6  we  examine  how  if  has  been  used  for  me.’surements  relating  to  the 
different  system  levels. 
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3.  Description  of  K.rnon 


3.L  Introduction 

K.i»on  ir;  a  mrinory  bus  inonilor  (or  Iho  PDf'-ll  family  of  computerr.  capable 
nioniforins  every  cycle  faking  place  on  fhe  POP-]}  Unibiis  [DCC7I].  Since  nio.'.t 
computer  system  components  and  peripherals  on  a  PDP-11  communicate  with  each 
other  via  the  Unibus,  the  Unibus  is  a  rich  source  of  informalion  regarding  every 
activity  taking  |)lacc  in  the  rompuier  system.  In  principle,  il  is  possible  to  lecord 
every  cycle  occuring  on  the  Unibus  and  post-process  the  data  to  obtain  the  required 
information  regarding  any  activity  on  the  system.  It  is  ho'.ve''er  impractir.=il  to  record 
all  the  cycles  at  the  rate  they  occur.  Moreover,  the  post- processing  program  v.iill  have 
to  be  v(;ry  complex  to  extract  the  required  information  from  a  large  Unibus  cycle  trace. 
K.rnon  th(;r(;forn  firovidcis  very  sophisticated  event  detection  mechanisms  which  enable 
tlie  analysl  to  rcrcord  only  the  events  of  interest  thereby  simplifying  the  task  of 
recording  and  post -processing.  K.rnon  gives  access  to  hardware  level  performance 
infonnalioti  not  otherwise  available.  Moreover,  the  event  detection  mechanism  ir. 

enpot’in  of  obbuining  inform-ilion  regarding  software  pcrform.-incc  without  the  insertion 

« 

of  software  breakpoints. 

IhiG  chaplor  is  intended  to  describe  K.rnon  to  a  level  of  detail  necessary  (o 
undei  stand  the  experimcciits  presented  in  later  chapter’s.  The  motivalion  for  K.rnon  and 
its  cfi'sign  philosophy  is  discussed  by  Fuller,  Swan  and  VAiK  [FULl.73].  For  a  more 
detailed  descripUon  of  fhe  opt?rations  of  K.rnon,  the  reacier  is  referred  to  [SVi'AM'/o]. 
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3.2.  Tho  Experimental  Setup 

K.mCM  prpot'nfs  il'.cif  ac,  a  p.i:;r.ivc  dnvicc  on  liip  Ui'il^iir.  o(  Iho  pioccr.r.or  unci(?r 
tiif.  ani  It  niiirnt  hont).  Anothn  procr'.t.or  <  P.f.iip  for  r.iipei''i!.or)  ic  i(!ci(iii(;cl  lo  toiitrol 
K.mon  ,111(1  to  (.toi  n  the  (iata  ^atlicrecl  liy  i|.  K.inon  Ihur.  ;  tr.idtilp'.  tv.'O  ljnil:iuc.cr.  ar. 
;.howo  111  Hiiiii  t;  3.1.  The  (iPdic  show'.  K.inon  conr.ecled  to  C.mnip  such  th.it  both  P.tiosl 
and  I'.Mip  ,iii;  processor!,  on  C.ininp.  It  slioiild  lie  noted  hov.'c.ver,  that  eitlier  or  hotli  of 
the  pince  sorr.  may  lj<:  conventional  stand-alone  POP-ll’s.  In  fart,  this  v.'ar.  the  mode 
in  v.'lncli  K.iiiiin  Vtiar.  used  for  niost  of  our  expoi  imf  nts 

Alt  I  tie  r.ii;nals  present  on  the  P.tiosl  Unibus  arc  avail, .bic  to  K.mon.  In  addition,  a  set 
of  2^'  probes  IS  provided  to  monitor  sip^n.ilr.  not  <ivail.:.lilp  on  the  Umbus.  The  prolirs 
are  ci.iriently  used  lo  monitor  the  following  c.i{^nais: 

1.  the  iiisIriK lion  fotcli  r.ip,nal  used  lo  distinf,iii;,li  between  tlie  iii'-truction  folch 
cy<lc'.  and  operand,  data  or  1/0  cycles. 

3.  a  signal  indicating  th.al  a  cycle  is  initiated  liy  a  processor  as  agaiivt  imli.itcd  liy 
an  I/O  device. 

3.  thine  liils  priority  level  at  v/hicli  the  processor  is  iiinning 

111!  input  signal.';  are  Irstod  rombinalori.ally  at  the  nrcum’iire  of  c'.kIi  Uniljus  cycle 
on  P.host  lo  cfetect  eveenfs  of  interest.  K.inon  can  bn  programmed  lo  lecord 
inform.ilion  sucli  as  the  address  or  data  involvccd  in  the  cycle  wlicn  an  eveent  occurs. 


^  Thri'?  nil?  m>ny  ;,<l\’iir,lnf in  h?\’irt(,  bnih  P  hr,r.C  nn;l  P  r.i'p  In  br  c,r',  on  C  i.rnn  ll  n.  Ihi-  full 

rr ♦“•CMii  :  f»r;  of  C  ••’mp  Dv'rnhhlf  for  In  r»fofc  «*nd  ponl-  pi  Mnro  ln'.llw,  flu-  nliloit-:  J 

by  K  m:m  enri  h»'  •j'svr)  by  fhn  opoMbn^  riy'jlom  fc»»  dvfij.,Mic  lumr.j  Hn*  dcian;  tri?  <  «ipi ’ulilici.  nf  r»iv.T»  ai'>  .•!♦(• 
nr  ■  r\  |ty  provKlifi^,  nr « •‘.  IlHic  ?  from  (ho  (rpornfin^  Fyidont 


I 
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I  hr  inloriihUmn  r.  i  c  onlcl  m  ,i  hiiflf.'r  (80  fv'cnf-.  deep)  myl  i'.  Him  h  .nv  ffi  i  cH  Iri  IHo 
hl.'^ln  mr mr.i  y  of  f’.Mip  viri  ,i  ,.t ;ind.ird  iiitorfiicG  (DLC  Dl?)  ) -n). 

I’.'.iiip  (.ofdiolf.  Hie  K.inon  via  five  cornmaiid  leoirderi.  winch  me  lecd  (o?-  iivl i.ili;’ .i| ion, 
j.i  llii'i;  of  tli(;  Ofiornliiiv  iiioder.  and  rrporline  c  vfcplional  condiHon;.,  The  r  /<  nf 
cfclrcHon  mid  (".'cnt  ro'.ptin^c  foncHonc.  in  K.iiion  me  roriiplrtrly  pro^.riiiniii.'.filr  mid 
they  ,11  r  Hetenmned  by  a  H?  word  spcciHcalion  word  iiir/iiici y  (5WI.()  which  e,  '.of  up 
by  P.Mip  prior  to  running  .m  rxperiinr  nt.  Fir.uie  3.?  :.hciwc.  Hie  liloci;  doieram  of  K.incn 
nnci  if;,  relalionf.hip  lo  Iho  two  proc c'.r.orr.. 


Figure  3.2  K.won  Block  Diagram 
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3.3.  Hvont  Ucitcction 

Till'  concept  of  an  ovoni  ii,  central  lei  all  liarifwarr  monilcii;..  An  event  can  be  loorrly 
cicfini.'d  <T,'.  an  occurrence  of  a  parlieular  slate  on  the  syslcm  unclor  inr. ar.ui (Miu.nt.  The 
cvc'nt  we.  are  intcreslcd  in  can  also  be  a  combinalion  or  a  sequence  of  olbci  c''c  nls. 
An  event  can  be  as  simple  as  an  occurrence  of  an  instruction  fetch  cycle  or  as  (.liniplex 
as  the  occurrence  of  the  first  operand  fetch  cycle  aftc^r  n/cciitine  the  iiisti  iic  tion  at  a 
certain  loralioii  wliile  executing  a  particular  user’s  piO'jranv 

Since  events  can  I'^e  so  complc-x  and  since  ditterent  e'/enis  need  to  be  monitored  for 
dif fi:M‘i:nt  oxi.ieriments,  tlie  c-vent  detection  mechanism  is  the  niC)rt  iiviporiani  part  of  a 
harciv.Mre  monitor.  In  K.mon,  events  are  detected  at  Iv.’O  lev'cls-  primitive:  arid 
arciiiiRilated.  A  primitive  event  can  occur  on  every  Unibiis  cycle.  The  primitive'  e\'c'nlr 
are  couidcd  until  a  specified  number  of  thorn  happen  Icadinf,  to  an  accumulated  (."'ent. 
The  accumulated  events  and  K.mon's  response  to  them  are  cliscur.r.rcl  later  in  this 
chapter. 

A  primitive  event  in  the  lowest  level  of  resolution  of  Iho  K.mon.  During',  each  cycle  on 
the  P.host  l.lnibuc,,  all  the  avaitahle  signals  are  latched,  fhe  input  signal;:  ari>  incpc'cted 
siinull ancrounly'  by  four  distinct  combtnafori.il  logic  units  to  detract  tout  distinct  primitive; 
events.  The  input  signals  are  divided  into  four  ditterenf  groups  for  the  purpose  of 
detecting  sub-events  which  are  combined  to  delect  a  primitive  event.  The  groups  of 
input  signals  arc?; 

1.  16  bits  of  Unibus  address 

2.  16  Ijils  of  Unibus  data 

3.  ■  1  6  bits  of  probe  signals  or  8  bits  each  of  probe  signals  and  Unibus  cycir.  length 

A.  7  bits  of  control  signals: 
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2  blit:  IJnibii:.  fiddiiitt  l)ilt  \  7  find  18 
2  bilt:  cycle  control  tipnalt: 

r(’;id,  r(>.ad-  p;uitr?,  write  and  wrile-byle 

1  bit;  tiipiiil  iiiclir citing  tbal  the  cycle  it 

an  intinrriipt  tcquesl  cycle 

2  bilt:  internal  (lagt  iir.rcl  for  cictccling  icquenccs. 


TIv'  till) -r.ventt  are  cletcctccl  uting  two  functional  unite:  coifiparalurr.  and  pnllcrn 
detcc  tort.  A  comparator  performs  a  16  bit  untigiiod  arilhriietic  compai  ir.on  between  ilt 
internal  c omparmon  value  register  and  an  cytcrnal  signal  group  (  iitually  the  16  bit 
Unibi addrett).  The  two  rptiill  tignain  arc  'I'qiiar  and  'ClI'O',  incliraling  that  the  input 
signal  in  rcpiat  to,  or  not  lets  than  the  comparion  value  register.  A  pattern  ctetoctor  is 
used  with  flip  data,  probe  and  control  signal  groups  to  detect  any  particular  bit 
palli-.'in.  ft  consists  of  two  internal  registers:  mask  and  pallern.  The  mask  regi.stor  is 
used  to  identify  the  care/  don't  care  bits  of  the  input  signal.  The  result  signal  'Match* 
ili  I  flic  iff 

(inoiil  signal  a  mask  register)  (pattern  register  a  mask  register) 

3.3.1  Combination  of  the  subevents 

Figure*  3.3  ctisplays  tlic*  arrangement  of  the  comparators  and  the  pallern  detectors. 
For  r  arh  primitive*  event  there  are  four  svib-evc-nis  -address  match/Gi^Q,  data  match, 
probes  match  and  control  signals  malch,  Fach  of  Ihesc  sub-events  can  bo  Icsfrd  for 
being  true  or  false  or  can  be  ignored  in  defining  tlic  corresponding  primitive  event. 
The  right  half  of  the  control  bits  pattern  detector  is  used  to  perform  this  final 
primitive  event  detection  function  using  the  same  pallern  and  mask  register  concepl. 
Fven  though  only  one  event  comparator  is  provided  for  each  primitive  event,  it  is 
possible  to  specify  a  primitive  event  which  inspects  llip  Unibus  address  for  being  in  a 
certain  range  by  using  the  (JEQ  signals  from  two  comparators. 


1  eVCNT  ACCUM[0] 
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3.3.2  Accumulated  tvents 

II  in  '.oriic limes  necessary  lo  tlclermine  Wow  many  time».  a  cerlain  primitive  (-''ent 
happens  between  two  other  events.  The  event  arcumulatorr.  provide  the  required 
(ounfing  fiiiKlion  for  this  purpose.  There  arc;  fouc  event  iiccumiilators,  one  for  i  <irh 
primitive  event.  An  event  ac.cuini,ilator  consists  of  a  16  hit  counter  and  a  16  bit  initial 
value  register.  When  a  primitive  D;'ent  happens,  the  counter  in  its  event  accumulator  is 
(lecremr-ntod  by  i.  When  the  couider  roaches  zero,  an  accumulated  e\enl  is  said  lo 
have  happened.  K.mon  responds  lo  an  accumulated  event  by  recording  certain 
inform.'.lion  as  cic;scribc'd  in  the;  next  section.  In  addition,  the  counter  is  loaded  with 
the  contents  of  the  initial  value  register  to  enable  stebsequent  counf-dowp.  Note  tlial 
the  initial  value  rccister  tan  bo  set  lo  zero  causing  an  accuinulated  cj'-ent  every  lime 
the  f orresponding  primitive  event  happens.  Another  coianionly  used  value  in  the  initial 
value  register  is  2^®-l  which  is  the  maximum  value  it  can  be  set  to.  The  c'vent 
acciiinulator  cout;ter  then  overflows  infrequently  and  can  be  used  lo  count  the  numbcT 
of  limc.'s  the  c  orrusponding  primitive' event  has  happened.  For  example,  suppose  the 
initial  value  register  for  primitive  event  1  is  set  lo  its  tn.ixiiviuin  value.  l?y  recording  (he 
value*  of  l>iis  counter  when  other  events  happen,  one  can  delenviine  how  many  times 
primitive  event  1  happened  bedween  any  c»*lier  c;vcMits. 

This  mrclianism  in  also  used  for  dclerinining  the  time;  cl.ipscd  between  two  events.  A 
special  cvvv;nt  ar.cumulator  is  provi.led  (or  this  purpose  which  counts  the  occuriencct 
of  a  flocic  tcck.  In  other  words,  the  primifeve  event  for  it  happens  at  a  constant  rale  ( 
programmable*  to  be  1,16,  2b6  or  ct036  microseconds). 


2>1 

3.4.  Invent  l^esponse 

Wl‘,c(i  an  event  ir.  d(Mcrtccl,  it  ir.  sonietimcs  r.iitticient  to  jui.l  record  it';  occurrence 
whnrr^ar.  .at  otlier  titncv.,  il  ir.  npccsc.ary  to  rcHorri  more  inform.-ilion  lilie  the  aridrer-c.  or 
the  cl.ata  that  caur.cci  Itie  event  to  take  place.  A  time  clainp  and  lire  valucc  of  infernal 
event  ac.ciiiniilalor  counler'i  may  alno  be  i  I’ciuirncl  for  l.itcr  anal'/'.iis.  Il^  tlic:  (  ally 
monitors,  only  summary  typo  inlormation  was  made  avail.ible.  Go  Iht;  only  rorisonsc;  to 
any  (-v(-;nt  v.'as  to  increment  an  internal  counter.  This  is  sufficient  if  only  jjrosr.  a'/crafie 
value';,  of  the*  moar.urod  cpiantilic-s  arc*  required.  If  hov/ever,  one  needs  to  [;t'nr,rate 
hislonramr,  for  consli uclinf,  analytical  or  simulation  modols,  more  detailed  inform.ition 
has  lo  be  oblainc^d  by  Ihc  hardware  monitor. 

In  K.mon,  since  there  are  five  event  accumulators,  any  comlMi-.alion  of  up  to  five 
accumul.iled  events  can  happen  siiviullanoously.  In  some  experiments,  when  two 
accumulated  events  happen  simultaneously,  different  action  need*,  to  be  taken  than  if 
ilu’V  hapfion  separately.  K.mon  llv-reforp  provides  for  an  r''(:iit  responr/'  ‘.pecifit  alion 
word  for  (rac.h  of  (he  31  combinations  of  the  fiver  accumulated  cv'crnls.  Whern  an  e''c:nt 
ir.  clntert(>d,  up  lo  9  words  of  inlormation  can  be  obtained.  Tbe^se  are:  acfdrrrss,  data, 
probers,  miscellaneous  signals,  clock  value  and  four  words  f.ivinj',  tbe  number  of  times 
each  of  the  four  events  has  occured  so  far  {  i.e.  Ihc  coui'tcrs  in  the  four  e  vent 
accumulators).  K-lorcovcrr,  two  internal  flaRS  can  be  set  or  reset  to  facilitate  detcclinR  a 
sequence  of  events.  If  a  special  timer  mode  is  set,  flag  0  can  be  used  to  enable-  the 
timer.  This  provides  a  dynamic  event  driven  mcclianism  to  start  nncf  stop  the:  timer. 
The  misccllancour.  signals  v.'ord  contains  the  flag  valuers,  the  Unibus  control  signals  and 
an  ideiilific ation  of  which  of  the  accumulatecJ  evenKs)  liappeiitril. 
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In  ‘,;Onu;  ftxpcrimK ntr.,  the^  i  ate  at  whicli  (he  data  ir.  juithnrcd  exceed;;  the  rate-  at 
wliicli  it  con  be  Iranrdorrd  cm  (lie  P.;;up  Unibtir,.  Two  buffer!;  eorb  v/itb  bO  cvconlr^ 
capacity  are  ucecl  in  a  doiible-buffcrinj;  mode  to  ‘.month  the  flc^w  of  data  to  P.Mip. 
Vk'trcn  the  bufferr.  overflow,  P.eup  io  interrupted  to  tal;e  tlie  necescary  action  r.iicii  as 
dir.abiinj’,  event  detertion  or  relnili.alizinE  its  prOf^ramr..  an  civorflow  liappcms  no 

(i)ithrr  data  can  bo  pathereef  ui^til  the  buffers  become  ficn  .ipeiin.  Information  about 
evenlc.  happenitip  efurinp  the  overflow  condition  camrot  l>e  obl.ained. 

3.5.  Gtrenyths  nnd  Weaknecseo  of  K.mon 

As  wc  ‘.liall  fife  in  ttie  following  chapters,  K.mcn  ha;;  l>een  successfully  used  for  the 
study  of  instruction  mixes  and  the  softv/aro  lockout  phenomenon.  There  are  many 
odier  inn asut(!irif! nts  for  wivich  K.mon  is  nol  an  appropiialc  loot  e.g.  memory  switch 
contention  or  tracing  of  all  cycles  taking  pl.ace  in  the  system.  Tlio  amount  of 
mcasui ciTicnt  space  spanned  by  a  monitor  can  be  loosely  defined  .is  tlic  powcu  of  the 
monitor.  Of  course,  in  addition  to  power,  the  c.asc  of  use  and  the  ca;.r  erf  atlachineni  to 
the  measuiTd  system  arc  also  important  par.nnetcrs  of  a  measurement  tool.  Svoboclova 
[SVOU.'bb]  defines  monitor  power  as  composed  of  four  componemte :  monitor  domain, 
monitor  i  afe,  input  width  and  recording  width.  For  the  purpciso  of  disc u;  sing  K.mon,  we 
have  partitioned  the  monitor  pc)W(;r  into  the  following  four  dimensions: 

1.  riie  monitor  domain  of  9  hardware  monitor  consists  of  the  signal.s  monilnicd  witln 
fhc!  help  of  prohes.  In  llie  case  of  K.mon,  the  cfomain  (onsi;,l;.  of  Hie  IR  hits  of 
memory  address,  the  16  bits  of  memory  data,  5  bits  of  control  sign.ilr.  and  5  bits 
of  special  signals  making  a  total  of  dfl  bits. 

2.  Tlic  monitor  rale  is  Ihc  rate  witli  which  cvenic.  can  be  detected  -  limitrd  l>y  the 


monifor’s  probes  and  logic. 


3.  The  inpul  widlh  of  a  moniloi-  is  the' total  number  of  bit  cornp.iri-irjns  apain't  the 
input  r.i^>^ills,  that  can  be  performed  (or  the  purpose  of  defining  events  in  an 
experiment.  For  K.inon,  the  input  width  is  ^  primitive  event?,  limes  the  dd  bits  of 
its  domain. 

4.  The  recorriing  rate  of  a  monitor  is  the  maximum  possible  rate  (  in  bils/r.ec)  of 
sustained  output  to  the  supervisory  processor  or  some  secondary  storage 
device.  This  dimension  of  power  is  a  crucial  parameter  (or  experiments  involving 
Iracing.  For  inear.ui ements  involving  counting  and  sampling,  the  re<rordii?g  rale 
can  usually  Lie  ignored.  For  K.mon,  the  recording  rate  is  limited  by  the  rate  at 
wliich  it  can  transfer  data  on  the  supervisory  processor's  LInibiis. 

3.5.1  Tlic  Monitor  Domain 

K.mon  is  a  memory  tins  monitor,  that  is,  most  of  the  signals  in  its  domain  are  already 
avail.;  tilc  in  one  place  on  the  memory  bus  (  the  Dnibus).  K'tost  commercial  nK>niloi  s  are 
dc’signod  (or  monitoring  many  different  computers  and  so  their  ciomain  has  to  be 
established  using  high  impedance  probes  to  pins  in  the  host  compufi*r.  This  apfirnach 
has  many  <lrav/l>arks  (  see  [riVOD'/fib]  chapter  5)  such  as  inaccf; ssibiiily  ot  pins,  clanger 
of  causing  a  hardware  failure  and  using  a  wrong  pin.  Our  experience  lias  indicated  that 
monitoring  the  memory  bus  is  cufticient  (or  most  experiments  and  moreC)vc;r,  in  most 
cases,  the  software  on  the  host  computer  docs  not  have  to  be  modified  at  all.  A 
speci.il  monitor  register  has  been  proposed  by  Hughes  ([tll.iGH??])  which  is  contiollablc 
by  P.host  sciftware  using  special  instructions.  This  register  is  monitored  by  a  liardsvare 
monitor  to  gather  the  information  supplied  by  the  software.  A  memory  bus  monitor 
like  1C. men  eliminalos  the  need  for  such  a  register  since  any  location  in  the  main 
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mrrinr.ry  can  art  like  Ihir.  rcf.ir.tor  --  Ihe  hardwarn  innnilor  nc.-ed?;  otily  lo  tiir;iiilor 
arcf'.rif:'.  lo  thir.  localion.  Oiir  (.'xpt’i it-me  with  RSXIJM  hooks  (  soe  section  C.2.3) 
clt^a^ly  '..liov.>s  the  ncivantiif.cT,  C)(  a  loeinory  bus  monitor. 

It  is  important  to  monitor  actili  linos  betoro  any  actciioss  mappii'f,  is  fiorformr: 'i, 
so  ttial  address  sc(*n  by  lire  monitor  is  the  virliiat  address  ['/  neratf’d  by  the  pto^f  am 
iindi.M  measurerrtent  and  not  lire  absolute  mcmcry  location  addirsss.  D/  doino  Ihir.,  the 
measi.iH!rftcnt  in  not  aftecled  by  any  dytumiic  relocation  of  llio  prciRfam.  lire  fart  that 
all  the  poriplicral  devices  are  conlrolled  via  Unibus  rr^.inters  expands  the?  domain  of 
K.mon  connidorably.  K.mon's  address  comparators  ran  be*  used  lo  menilor  tlie 
command*;  bciing  given  to  the  periplieral  devices. 

Many  of  the;  menr.uroments  pemformed  by  K.mon  can  bo  irorfc'rmcd  in  a 
microprogramrnable  host  by  suitably  altering  its  microcode.  Vw’c?  will  trier  to  this 
ti'chnique  ar.  firimware  monitoring  in  tli«  rest  of  this  cli.ipli.’r.  rioth  a  finimare  inonilctr 
bar.  a  larger  efomain  f  o.g.  internal  registers)  brtl  it  lacks  the  ability  lo  monitor  evcuils 
insitfo  the  devices  and  the  also  transfers  between  the  devices  and  mcmcry.  Jt  slioutd 
be  noted  that  a  hardware  or  firmware  monitor  has  a  vc^ry  tcsslricled  domain  compared 
to  a  soflv/arc  monitor.  A  software  monitor  essentially  has  all  memory  locations  which 
can  lie  read  using  an  inslruclton  as  pari  of  Hr,  domain.  It  cannot  however  monitor 
device  activity  unless  the  CPU  is  involved  in  it. 

An  entirely  new  problem  ari.ses  when  one  is  monitoring  a  mulli-proccssor  or  a 
network  of  romputcifs.  The  domain  of  a  monitor  needs  to  bo  expaiVdcrd  to  incUicte  all 
lha  prncrr.r.oir,  in  order  to  study  tlie  operation  of  Die  sy'.lcm  as  a  wholr.  Kolanl.o 
IKOLA77]  and  Tesdata  rrfiort  u*:ing  a  hardware  monitor  to  mcinaue  a  network  of 
computers.  Our  exp(>rii;nc.e  will)  C.mmp  suggests  that  hardware  monitoring  ot  a  miilli- 
proccssor  faces  two  main  prublcmn: 


?s 

a.  lilt;  iiionilfir  har.  lo  .i(,f oiniootlale  a  laiiViT  doniaiii  wlifro  (ii((r-i't!nt  t.lf.iiiil:;  in  tho 

(lotii.'iin  arc  valid  al  dil  tune  in'.tcinls 

b,  riu;  hionikir  ;;boi.ild  be  abl«:  lo  handle  a  higher  inpul  rale 

Similar  probicrnr.  aririf:  when  ur.ins  mulli-port  inemery  or  v/hen  the  proct'cor  rers. 
dilh-’iifnl  biJ'.c*.  for  <  ormiii.init  alino  with  diflerenl  memcrii’?. 

3.5.2  The  Munilor  Rale 

Iho  fad  IhnI  K.mon  ir;  a  memory  bus  monilor,  sets  Ute  maximum  useful  iiionilor  rate 
In  bi-  IIk;  rale  al  which  mrtnory  cycles  occur  on  Ihe  host  memery  hus.  lliis 
arran;\i- raent  forbiits  any  measui cments  at  levels  below  a  memory  cycle  (e.p,.  cache 
hits,  len<^;tli  of  a  cycle  ^),  but  it  war.  used  to  simplify  the;  sync  hi  nni;Tation  probieins.  The 
hif',b  event  detection  i  ale  of  hardware  monitors  has  traditionally  losuHed  in  their  use 
for  certain  countinp  type  measur omentr..  In  fact,  since  the  recordinp,  rate  of  a 
harcfu'.jre  mcnilor  is  usually  lov/,  whenever  evenls  am  to  be  detected  at  a  hir,h  rale:, 
the  most  common  response  to  an  event  is  incremenlinf,  a  counter  dopendinj;  on  the 
event  occured.  Svobodova  [r>VOG76b]  bar.  proposed  haidwaie  aids  for  internal 
inonilorinj;.  These  iiKlude  counters  for  counting  evcnit.s  commoidy  counted  by  a 
haiciwaro  monilor  (  o.p,.  meinory  cycles,  instructions,  cbannul  use  iincl  nvccri.ip,  various, 
timer.;).  )f  processors  arc  d'  igned  with  such  internal  hardware  aids,  many  c'xperimcnts 
can  hr  performed  using  software  monitors  witliouf  the  nc'ed  of  an  evpc'nsivc;  baicfware 
monitor. 


2 

K  mf...  b,.|  Co  employ  oppi.rnCo  circuila  tor  irorimnmt  Ihi'  lpn;lb  n1  n  cycli 
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3.9.3  Thi-  Input  Vk'itSth 

The  inpi/l  width  of  a  monilor  dr-terminR.o  the  mimbcr  of  ciiftc'rrnt  ev'ents  that  can  he 
cfefint  cl  in  an  rxpciriiucnt.  llio  input  width  of  K.tnon  is  fixed  at  times  bit;..  I  his 
initnbiM'  is  soinewiMl  inflalc'd,  however,  niiKc  for  most  experiivir.-nl'.  only  a  part  of  this 
total  width  is  used,  for  monilorr.  which  employ  a  maiMinl  patch  hoard  for  event 
detection,  the  input  width  is  '/ariahic  and  each  experiment  is  typically  defined  with  the 
tniniimim  required  v^if'',h.  This  approach  resiills  in  a  saving  ol  l■laldwar(;  < oinponc.nt-.  { 
e.p.  c  ompai  ators,  hit  masks)  at  tlie  expense  of  the  case  of  use  and  llir  time  requiiecl 
to  ;:et  up  an  expt'riment.  Ilie  loss  of  flGxil'.>ility  resulting  from  fhe  loss  of  a  manual 
patcli  boat  d  cliti  not  reslrict  the  applicahilily  of  K.mon. 

For  multi  -processors  and  computer  networks,  (I'C  input  v;idlli  of  a  mr;nilc>r  lias  to  he 
incre  ised  since  the  number  of  hits  needed  for  an  eftcclive  study  increases.  Jt  was 
oriiiin.tlly  proposed  lo  build  One  K.mon  for  each  processor  in  C.mmp.  This  solution  to 
the  input  width  problem  cannot  usually  bo  adopted  because  of  the  expense  involverl. 

*. 

3.5.4  The  Recordinp;  Rato 

Tlie  definition  of  the  recording  rate  assumes  (hat  (he  lmrdw,irn  monitor  has  access 
to  some  high  speed  secondary  storage  device  either  directly  or  through  a  supervisory 
computer.  A  high  recording  rale  is  highly  desirable  since  it  lets  an  analysl  take  full 
advantage  of  the  high  input  rate  of  the  hardware  monitor.  The  experiments  performed 
using  K.mon  indicate  that  the  ability  to  use  the  hardware  monitor  in  tracing  modc;  is 
very  valuable.  In  the  Iraring  mode,  the  monitor  arts  like  a  tiller  allowing  selecTc'd 
Unil.iu-i  ( yc.li:-s  to  l)c*  rcjcordc^d.  The  traced  data  is  then  post-pioccssc^d  to  g.enrratr; 
taljlc.  and  histograms.  There  is  a  trend  in  hardware  monitors  to  perlorm  data 
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proci  ‘  ‘..in;',  and  '.lor.i;’,'.'  dpi.'i  atlon^  wiIIimi  iIc  owo  tu'trdw.)/ c.  In  ihf'.r  iiiomlorr  ,i 
tr.uir;',  inOd'c  n'Oci'i  lo  be  (jrovided. 

Thf'  I  (!C orriiiit',  rate  ir.  atfertrei  by  many  (.irlorr..  In  the  haidwFii  e  inonitoi-  iterit,  hiph 
;.p('r(l  buffer;-.  (  .al  Irar.t  I  .vo)  need  lo  bo  provi(3fd  ;.o  burslr,  of  data  c  .an  bo  rrcoi  ilrcl 
v.'ilhout  any  lo'.;..  The  hiph  r.pocci  louder'-,  nr-eci  lo  bo  omplieci  into  the  m.iin  inr  iiir;i  y  C>f 
P'-.iip  (  if  one  I*,  ii'.od  ).  I  Ilf;  final  slane  is  :.torin;;  Ihi-.  flala  on  a  iri.i<;nr'li(.  Ia);i‘  or  a 
A  botllenccl'.  can  bo  preocfni  at  any  of  Ihccc  traii'.fers,  v.'liuh  m  a  ciiimnir.hoci 

I  (■  < oniiny,  rate.  In  flio  case-  of  K.mon,  llie  l>ii(ferffl  liabi  int.ido  k.mon  is  ti  an;  (oi  rod  lo 
P.Mip  via  it;.  tJnibii;..  The  rt;cordint;  rale  lO  thercfori;  liirnted  hy  llio  speed  of  P.'.Lip';. 


Unibus. 
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4.  The  Insh-uction  Mix  Expciriment 


4.1.  Introduction 

ll^  Ihr  I.T'-.t  Iwo  (h.'iploir.  wp  px.iminoci  IHo  pr-rfonn/nic r  p.iinnic Icrr  ;it  Hip  ''fiiiour. 

Ipvpit.  .Tn<l  th(!  Iiarriwarp  iimniloiio^  tec liinpucp..  ffiiplo/eci  lo  rluH'/  Ihc.c 

» 

p.ii  .»iiif  tpr!;,  I  ill'.  (Ii;ip|r.'r  ilp'  C  i  ,-i  in.'ijor  rxponmr  nt  div  ii'.iii  'i  lo  .■in'..'/.'pr  i.finir. 
iiripnil;iiit  put tion'i  fillip  (lie  architecture  of  a  fOiii|;u1er. 

Or.','  of  the  tno '.1  iinporlaiit  s.et  of  mfiar.ureiMtntc  at  tiie  nifltuchcn  lev'cl  ir.  thc- 
in'.ltuction  iisapo  i.e.  the  inr-truction  mix.  Tlic  inr.lriic  i'ou  iin;',  rornl’ nc  .i  v.ilh  iiie 
average  time  lai.cn  to  execute  cacti  of  the  inclividual  inc.li  uclion'.,  yirUic  the  a'/c:r.m.e 
overall  ineliuction  execution  lime  for  mor.t  cdr.inThtfor'A'.arif  iirocf-cc-'r  iinplriiieiitnlionc, 
Tliis  12  a  mear.ijre  of  the  raw  r.poecf  of  a  computer  and  r.o  it  can  be  ii'.ecl  in  comp, vino 
cfiffcrnnt  coinputcrr.  thaf  share  f/ic  same  architecture.  II  '.hoiilii  he  noicci  howr  \  rr,  th.it 
in  tilt  prf'sent  ocner.ntion  of  cornputerr.,  the  effc;c.ls  of  pipeline  arrhit'-LlLii  e  and  cache 
ivir-ivir.ry  icciucr  the  importance  of  the  instruction  mix  in  tlic  dcti, rmiii.itinn  of  tlio 
3W(.;r;v'.e  c>vc:ralt  instruction  execution  time.  Tlie  main  use  of  instruction  mixes  m  now  m 
tlicr  cf'jsirn  anef  implementation  of  tlie  instruction  set  processors. 

Another  important  parameter  at  this  level  is  the  usai'e  of  adcfressiim,  moefes  and 
‘.peci.il  registers.  Cartier  computers,  had  r.pecial  index  reeisterr.  in  addition  lo  r;cnc:ral 
purpcise  registers.  The  use  of  index  registers  consirierahly  speeds  up  address 
caicciintions  for  accessing  arrays,  and  other  data  slrurtuies.  Similarly,  in  some 
conipulcrs,  a  speci.il  'indirect’  till  in  the  instruction  word  indicated  tlial  the  aricli  ess 
prov'iifed  is  arliially  a  pointer  to  the  real  address.  Hie  (rcciueiify  of  use  of  index 
rcgis'f.rs  was  rrfxu  Icvl  |)y  (uijson  [(JIRS/O]  as  a  scpar.ifr  instruction  in  Hie  'indevin-;’ 
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(IrU.-.  l>c'C<iii'L‘  C)(  it'..  iivirnirKiiK  c.  r)ur  t.tudy  ir.  b.T-.ccI  on  llir  mv.lruclion  tni;<  for  ]  ] 

Inr  which  thcrn  nil.’  no  cy|Miril  inclcy  i (:jv,i''.lcir.  or  inciiii’ct  lnh..  liictr.irl,  it  h.i'-.  S 
o()()i  {  •  modrc.  which  me  iicccl  witli  ;iny  of  Ihr  T  ('.cnfi'.il  pm  po'.c  i  (’'’.r.tci  nr. 

(cillovr  [l;rC7)]: 


1  cil)l(:  •'1.1  I'DP-ll  Addrocriiio  Modor 


Mode 

Gyiiil'olic 

Description 

0 

R 

operand  is  in  R 

J 

(1?) 

acidii-s'.  of  the:  operand  is  in  R 

2 

fR)i 

saii'if;  as  mecic  1;  but  P  is  me  i  rnif.iilcd  idti  r  use 

3 

rr.(P)4 

address  of  the;  ai.'dress  of  the  operand  is  in  R;  R  is 
incremenfed  after  use. 

/J 

-(R) 

sarriG  as  mede  1;  but  R 's  ((ceremented  before  use 

to 

nn.(R) 

saiiiG  as  iiiede  3;  bul  R  i;.  decremented  before  ij;.e 

6 

X(R) 

X*  value  in  R  is  Ific  acfriiess  of  the  operand 

7 

•o.X(R) 

X*  value  in  R  is  the  address  of  the  addi  of  the 

operand 


Ro  ’irterr  6  and  7  have  rpec iali;?ed  functions.  Pesisfer  6  is  used  ar  a  rIacK  pciintirr 
and  r(i''irtf;r  /  is  the  pro^r  am  counlor,  Register  6  is  u?;iiall>'  le.r-d  mndec  2  (pop), 
d(piicli)  and  fifriaramivicr  access  williin  the  stack).  Register  7  is  usually  used  with 
tnode  :-  2(iinmt  cliate),  3(ahsolute)  and  bfrelativc').  Goivig  inv  ti  uc  tionr  have  only  one 
operand  (  e.g.  IMCreiTienf,  CLR),  which  is  specified  by  a  siriglc  inodt  ■register  (rail. 
Tlicrc  arc  also  doui.ric;  operand  instructions  (c.g.  MOVc,  ADD,  DlHest)  wu.v  i  ha'-o  a 
source  .nnd  a  destination  operand  each  specified  using  a  inode-i pgistci  pair. 
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<111  Um’*  of  Inr.friii.fion  (.fix 

I  \'t  n  hi'  lf!!  ir  llip  m'.trutlion  ini.v  hceii  u'.r  '.)  In  rdlf  nl.iti"'  Ihf  nvri 

nc  I  ion  t  vfCiilion  '..pci-d  of  ,i  ( f)ivip\itei‘.  it  t^a<.  (oimd  in/iiiy  inotc  ll■..|•‘..  Aruf'iy;  tAc'P 

HI  c : 

1.  ‘jiiin  of  fi.iluK!  pri.'i; f  ■..■-.orr.. 

DcEi(;n  of  a  new  inr.lriic  tion  r.ct  involve-i  traric-offo  between  cor.l  of 
iii'ipii'inn  nt.ition  anci  I'lOwcr  of  the  inc.li  lit  linn  ;.el.  iK  l  .vcen  h,iMf\eir(:(i 
miplriiventation  .ind  mir.roc  ocircl  or  r.oflv.'aie  KiiplrinrnlatiDn,  l.irtwreii  opcr>i;le 
rncociine  an<l  lime  needed  for  opcode  decodm;;  .ii'd  -.o  on.  Woico'/or,  deciMOn:. 
have  to  be  macio  reeai  rimn  provi'iioiu.  for  niimc  .ii.ite  operand!.,  pri’fclcb  of 
jnetriK.lion'i  ancI  ni'iei  and<.  anci  number  cif  r;en(  r,il  pm  jxi'.C'  i  r;v'  ‘err..  All  trit'’  e 
dcxi'.iono  need  tlic  in'ittuction  nii;<  data  to  make  optimum  elioittc.. 

2.  Emulation  of  an  inc.tfuclion  c.ct 

Due  to  the  conoidi'i  al^lc;  roftv/are  effort  vested  in  the  cxistinf,  inr.truc  ticn  r.ct  '.,  it 
ir.  acfvantoc’eour.  if  new  processor',;  can  cmul.dc  instruction  sets  of  (hrir 
prcdecci'-oor',;,  The  efficiency  of  emulation  can  be  incira:.t:cl  l?y  pioviijii-n  spec i,il 
features  in  the?  hardware  of  tlie  new  processor,  riomr;  of  the  emulation  can  l>e 
pei'formeri  usinj;  microcode?,  some  can  be  done  in  software  without  too  much  lime 
penalty.  Instruction  nii:<  helps  in  deciding  the  level  al  which  an  instruction  can  l;ie 
ernulated. 

3.  De';if;fiinf',  a  special  purpose  implementation  of  .ut  r/.'istiiv-'.  in-.lruction  set 

llu:  instruction  inix  data  can  i.omelimc',  indicate  tlml  a  paitiiular  apcilic  .alion  is 
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'.lfon;;ly  in  fnvor  of  iiT.inj;  cei'liiin  iii' t(  u(  tioir,  In  'uch  i  .v.c  ‘  ,  a  i.il 

put|>ci‘;(:  iinplrinir.nf.'ition  (  in  the;  form  of  a  nev/  n/cdcl  or  a  It^iiIwoip  option)  of 
llie  iii'ilriK lion  'i.et  oplmii.'fed  for  the  apecific  iiv  (tuclionc.  v/ill  bo  u^rfi-il.  The 
advcintar'.o  to  bo  ijainoci  by  <foino  this  can  be  qiiantifircl  i/i^inf;  ibe  inctriidion  ini;<. 


4.2.  Review  of  Previous  Work 

Sludit'c.  of  frcr|iioncy  lounts  for  inclrutlion  eveentionr.  Iiave  liecn  cJe':.rril?iict  by 
r;c' vei  nl  .uilhori'..  The  bee-t  Knov/n  ir.  (ho  Gil^r.on  mix  (r.ro  (Gi(3r'701),  dc'  clcijcd  Ijy  .hicii 
C.  Gil>'..on  at  IPt.^  in  1959.  Gontcr  [G0NTfi9]  liar.  compaiC';l  tlic  Gibrrn  im:(  willi  llic 
Univcrr.ity  of  Mar.5.irliiif:cll'$  ivvix.  Hir.  results  correlate  well  with  Gibson's,  (he 
r.ubr.lancc  of  (hose  restills  is  that  LOAD  and  STOf'C  account  for  about  30/  of  the 
inc.ti  uf  lions  oyccutcd,  branches  for  16/  to  3R/,  incley  iiianipulalionc;  13/  to  IF./  and 
ai  ithiiif.tic  3/  to  19/.  'Ihn.o  losuits  ciepend  on  both  the  IFP  and  the  subjecl  pio;'’,i  ain 
set.  i.ifhcr  similar  mixes  are  rcporlod  by  Arbuc);lc[AP(?l.l()6},  Cornrrs,  Mercer  .uid 
Sorlirii[(rDMI<j;'0],  f?;tir;  lie  Ison  and  Collins  [RA1C66].  I'ostcr,  Gonter  and  Piscin.an 
[F05T71]  have  gone  one  step  further,  by  investigating  the  effects  of  reducing  the 
instruction  set.  The  emplrasis  of  the  above  studies  was  mostly  on  llio  evaluation  of 
r.aw  processing  capacity  of  the  central  processor.  The  subject  programs  were  limilc.-d 
to  user  prcjgrams.  Lundc  [HJI'1I.'’'/'A]  measured  the  instruction  mix  for  ITJP-IO  .and  also 
r.lud/f'd  Ihc  register  ulili;?alion  and  commonly  occuring  instruction  sequencers. 
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In  ffci.  IKf  wfiffftfion  in  Oir  tni  (f  Ifon  rfr  from  iippli:  itmn  lo  applfritifn  f:*fmr  fhr  bn!*?  fhii.  rb*pfpi 


Mntliods  (or  Obtiiininf;  Inr.triiction  t.iix 


A  ''«i  ii;ty  of  molliofl'i  h;ive  br-cn  iif;ed  by  resfoi rliors  to  obt;iiii  tlio  iiv.triiclion  tni:<. 

J.  IiK.lruftion  or  (n.ir  hino  cydo  trorios 

('oilier  ttiixo;.  wc;r(.>  usinj;  sofIv.Mri;  tracio'',.  lii  lliir:  nifitbcid,  every 

iiu-triirlion  (  or  ni.H(  biiio  cyrie)  ir.  recorded  to  obl;iii>  tbp  in'-.tr  nction  (  ;iiici 
opr-r;ind)  vali.ies  aii<)  other  rolevant  information.  If  a  hautwiiio  monitor  i;.  iiricd 
for  tracinp,  opcratlnp,  system  execution  ar.  well  ar.  user  prov’.ram  eyeciition  can 
Ijr;  li.iced.  Tlic  problem  here  is  that  llie  internal  memory  of  the  h.iicfu'arc! 
monitor  gets  filled  rapidly  and  it  cannot  be  emplied  info  '..econdary  memr^ry  fai.t 
enough  to  a'/oid  overflows  [t50f*D71].  Software  me  Ihod',  rely  on  ir.le  i  pi  etalion  of 
user  programs  [l.l.)ND)'4,  VV1N(!>7S],  witli  an  interpreter  designed  to  gather 
irquiri’d  statistics.  Tlic  dir.advantaopr,  of  this  method  are  that  only  user 
programs  can  1)0  traced  and  moreover,  tl'O  eyeciition  of  the  traced  user 
programs  gets  slowed  down  by  orders  of  magnitude.  Tbis  mntbod  Iherf'tore 
cannot  be  used  to  obtain  the  instruction  mix  for  real-time  programs.  [I\'en  with 
these  disaef'/antagos,  tracing  has  been  used  l)c'caijsc  r.uci)  a  trace  is  a  rich 
soiireo  of  information.  Apart  from  calculating  the  insli nction  mix,  the  tr.icr;  can 
ho  used  to  gather  information  on  register  lives,  sequences  o(  opcodes,  address 
calculation,  locality  of  rofernnee  and  distance  between  branebes.  An  instruction 
trace  can  also  bo  used  to  drive  a  processor  simulator  to  ovalurtte  various  paging 
and  othc'r  algorithms. 

2.  Microcoded  measi.ireinent  facility 

f?ecently  tr>VO(376al  microprogrammed  processers  have  heen  used  lo  gather  (lie 
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in'. triif  lion  ini',<  fl.it, i  (or  tlio  iii^.lrutlion  «,f-t  I'f  in;;  impirinfinlofi  on  '.iifh  .t 
proco'.t.fjr.  Alonf'  v.-ilh  intorprolinj;  the  inf.lriiflion  srI,  tho  iiiici  oc oclt;  v.'flt; 
|>r(>;>,f  .nnicifci  to  collt-cl  llie  instri/clion  mi:<  in  tlio  (;i;.l  r-tor.iou  ,iv;iil.iblp  intern,)!  to 
ttif?  inicroproccstior.  Thir.  rnrltiocl  is  i.iinil.^r  to  the  isrcvioii;.  rnelhocl  but  it 
inlrodtjcer.  very  littlr?  ovorhc.id  and  ir.  tcrtainly  choapor  th.in  omployin';-',  a 
h,Ti  ((wai  p  monitor.  It  is  however,  applicable  only  to  oiir.i  oiiro^ramincd 
processorr.. 

3.  .lump  trace 

Alry.indf'r  [AI.I:X73]  (iest nbes  a  vari.dion  of  tracing  c,illn(l  Jump  Ir.icinf,.  In  this 
method,  lr,nc.int;  information  i'.  n,-ithered  only  wlicn  the  flow  of  contiol  is  altered. 
Thi*;  (nethod  ieilroduces  less  slow -down  for  the  iiser  [jiopiam  but  the-  Ir.icc 
(jrodoced  is  not  ar.  uiehil  as  the  complete  Ir.are.  ‘.Aorco'/er,  to  do  thin  entirely  in 
software  rec)uires  the  compiler  to  insert  appropri.tic  code  to  .ulivato  the  tr.iccr 
,t|  the  proper  jump  points. 

d.  Samplinf, 

Vv'lirii  detailed  information  is  not  required,  one  can  obtain  p,ai  nmetei  r.  like  the 
instruction  mix  or  execution  profile  by  samptinv  the  processor  '^tato  at  raiidom. 
Software  samplers  are  time  driven  and  interrupt  llir  procerssor  at  random  times 
to  obtaiii  the  inslruc  lion  or  program  counter  in  use  at  the  time  of  the  interi  uftf. 
Software  samplers  therefore  cannot  sample  uninlen  uptable  opeiating  system 
code.  Ktoreovor,  separate  samplers  have  to  bo  written  for  every  opnimfing 
system.  A  '  ardware  sampling  monitor  removes  these  i o.striclionr..  Also,  such  a 
monitor  docs  not  cair..e  any  overhead  or  perturbation  in  the  operation  of  tlio 
system  under  mr.,ar.urn'menf.  Vvfe  have  therefore  chosen  this  .appco.ic h  for  out 


study. 
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^.2.2  Ihfiicjilion  of  Vnriiilion  in  Iho  Inrilruclion  Mix 

Most  of  the  rcsraif Imrr.  inranurins  the  int.trtif lion  iiiix  lin'-o  i('poi1t'(l  that  Urn  mix  ir. 
ctrpencirnt  on  thn  appliralion  arna  dioscn  for  mc.irvi-iroriirnl.  let  115.  briefly  follow 
Ihroi.iph  Ihn  various  stutliiji.  olrserving  tho  variation  iii  the  in;.ti  iic  lion  mix.  On  tbo 
liiljbrst  li-N'et,  ll'iorr  is  v'ariation  betwpon  (liffcnMit  protf  i.ors  (i.e.  tiC’tv/p.on  ciificrriil 
insti  lit  lion  tiels).  Goivir;  of  this  variation  is  of  course  tint-  to  tho  procoi  c.oi  s  hi  iiv', 
inte'ii'lecl  for  (fifferont  application  areas  and  due  to  non-unilormily  of  ortodo  definilion 
from  processor  to  jirocessor.  It  is  however,  instructive  to  {;roup  opcodes  in  certain 
oroups  for  romp, Trine  different  instruction  sets  as  was  deme  by  Gibson.  The  follrwipp 
table  IS  rcprocluccci  from  Luncle. 


Table  ^.2  The  modified  Gil?son  classification 
Peu-f entag.i?  of  tho  executed  instructions  ii^  tlir  Gibson  classes 


Instruction  class 

Gibson's 

results 

IBM  6b0/70'J 

UMASS 

results 

CDC  3600 

1. unde’s 

results 

l'l?P-10 

1  C5ad, store 

31.2 

30.0 

d2.a 

Intef'or  acfcl, subtract 

6.1 

1.2 

12.l1 

Gniiip.Tfcs. 

3.8 

J.2 

. 

Ilranc  hoo 

16.6 

33.3 

23.2 

Kle/ilinn  add, subtract 

6.9 

0.5 

^.9 

FIcatina  riTulliply 

3.3 

0.5 

2.6 

KIcalinp  divide 

1.5 

0.2 

1.1 

Inf  oner  Multiply 

0.6 

0.1 

1.1 

Int'-'i.ei"  Givide 

0.2 

0.1 

0.5 

ShiflinR 

2.2 

3.9 

Lopic  at 

1.6 

0.5 

1.0 

Miscelleneoiis 

5.3 

0.0 

1.5 

Gv’ )l:if)ciova  (GVOIl’.M]  li;ir.  coinr.)i'f^ct  '■>  ^ci  of  in^.truftion  iih‘(Oc.  for  Ihf  G(/0/o70 


'..r  rir?  of  proc r '.■'.CM  r..  llirr.r  prfKC'.oorr.  hove  very  r.iinil.ti  iiv.Irnclion  ‘  elr..  I  he 
\'.)i  i.Jiion  in  the  in'.lruc  lion  inix  re'.ults  from  inear.ui cment-  |)f!rfornu;(.f  nt  fill  Ir  i  cnt 
in;.f  all.:ifion'..  I  he  lahin  given  by  Gvoboefova  is  refirodofccl  helov/; 


Tallin  ^.3  inslrutlion  Mix  at  Differenl  Inslallationc. 


Installation/  Model 


(ipcode  type 

Glanford 

tiniversily 

370/1 -15 

Argonne 

National 

Laboratory 

360/75 

L.ICLA 

360/0) 

RCA 

l..il;oralc 

70/45 

lnt-'g<''r  load,  .store 
and  ariltimrtic 

36.51 

50.E5 

25.2) 

25.7 

ricaling-  point  load, 
store  and  aiillimctic 

0.52 

2.82 

28.62 

- 

Dc:  iinal 

0.06 

- 

- 

'.).0 

Dr.tnc  h 

32.7/1 

26.04 

1S.30 

34.2 

Lo.i’.ical,  compare,  move 

1S.97 

17.15 

13.4J 

17.1 

Control,  I/O 

0.54 

0.37 

- 

- 

Noie:  IBI.'t  370/105,  360/75  arc  arithmetically  oriented. 

Iljid  3(;0/PJ  lias  a  (>owerfiil  floating  point  unit 

RCA  70/05  has  an  instruction  r.ct  almost  identical  to  Ihal  of  the  ICM  360. 

The  instruction  rnix  wa?;  obfainesd  from  tracing,  user  program'.  {  mi.>slly  Cobol) 
-  inrans  no  inform.ilion  war.  nvail.ibir  in  the  refernnre 
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ll»r;(ifs  (I  ri'fiOrl:.  v^ni, 3(1011  in  llin  itvj.trtic.linn  iciiy  l)rlv.'C^n  ii.'.f.r  (;i  »;;i  oinr, 

Hii<(  ((h:  oporafing  '■'/'■leni  I'ituilly,  rivobodova  and  M<)ll'»on  fSVOBi't'il  irporl  Ibat  Iho 
iii'.it  i.f  lion  mix  can  vary  hi.'lwcrn  p)in;.c<.  o(  r;>i;ciilion  of  a  '..iiv-'Jr  pi  i;;v  am. 

4.3.  Slatcnicnt  of  the  Problcim 

11  '.an  b(;  '■.ccn  from  (he  alr'Ove  discussion  Ih.il  (here  is  noliir.il>le  imii.ibilily  in  (lie 
inr.lruclion  mix  from  ni.idiine  (o  mricliine,  from  inr.tall..(ion  lo  install. dion  (or  the  '..'imo 
tii.xcli'ne,  from  piOf.ram  to  program  in  the  s.imc  install.- tion  .and  f.-'cn  from  phast.  to 
phase  in  the  same  proj’.ram.  (  or  programs  wnllfn  in  a'.'.i  ml.'ly  lanouai'.ts,  tlic' 
instruc  tion  mix  v/ill  cirpe-nd  on  tlir.  partiinlar  prOj;ramii)er  wi  ttinf;  flit;  prn;;,i  am.  This 
?ri.4ki:'.  the  use  of  tlin  instruction  mix  for  design  and  optimisation  of  pi  ocr. ssorr. 
r.fjmfiA'tiat  difficult.  Any  measniement  of  (he  insliiiclion  mix  vvtiirli  tines  nc;t  ;.p;in  all 
areas  of  .ipplirotion  of  the  mear.uied  processor,  cannot  be  assumed  lo  leprescnt  the 
ovc'-rall  instruction  mix  expcrricncecf  by  Itio  processor.  Since  Die  rxlenf  of  \  arialion  in 
the  instruction  mix  is  not  known,  tfic  processor  cfesicner  faces  the  (oilowine  pit  foils; 

1.  If  the  v.iriiition  in  the  instruction  mix  is  sip.nifir.int,  the  cost/  pci  (orm.snce  ratio 
depracfr.’s  if  a  non-reprr'sentative  mix  i.s  measured  and  more  .attention  is  piven  to 
unimportant  instructions  at  the  expense  of  important  instructions. 

2.  A  processor  oplimi:red  (or  a  balanced  inslruclion  mix  is  suspected  of  being 
unoptimi^ed  for  a  particular  application  area  and  is  Iticrcrfore  not  used  eve  n 
though  tlie  actual  variance  in  the  instruction  mix  from  area  to  area  is  negligible. 

It  IS  important  to  do  a  r.rientitic  study  to  quantify  the  vari.rnco  causcfl  by  the 
diffc'reni  (aclorr.  in  tlio  overall  instruction  mix.  Quantification  of  (he;  van.uHe  as  ii.ls  the 
hardv.'ari!  arcliiloct  in  avoicfing  the  above  pitfatts.  Ttic  vari.stion  iri  (lie  instruction  mi;,-  is 


c.ni‘..c<l  by  rn.iny  fiirtorr.,  r.oiiif;  ol  which  were  cliscurGCcI  in  ‘..t  ction  V.'e  list  below 

tbe  ivior.t  iivipoi  tiiiit  foctori;  thal  have'  an  impact  on  ihe  ioGtriiclion  miy; 

1.  The  ioGlruclion  gcI  of  the  processor 
Tlie  broad  application  area 

3.  The  inctivKiiial  proprams  belonping  to  the  cJiftcrent  application  arras 

4.  fbe  diflerent  pbasarr.  of  evecution  in  a  propram 

b.  fbe  compiler  used  to  translate  tbe  high  Icvcd  program  into  the  machine  l.)ngLia;',e 

6.  ’the  indiviiiual  progiamnicr  in  Ibe  case  of  assembly  lanctianc  )m (e-'i  ams 

Vv'c  ha'/c  decided  not  to  investigate  the  variance  caused  by  Ibe  dillc'iencrs  in  tbe 
inslriic  lion  set  and  by  the  individual  programmers  since  tlierr  v.mII  form  complete 
sliidii:  ;.  by  tbemselvc's.  The  melhcids  used  in  out  study  can  bo'.vevc:r  be  exic.'nded  to 

study  tlif?  variaticc  caused  by  lb(**.e  two  factors  also.  We  will  al:.o  nol  study  Ibe 

variance  causc-d  by  the  use  of  different  compilers  mainly  because  if  such  a  study  is  not 
to  become  loo  compiler  specific,  it  neecis  many  diffcrenl  compilirrs  wrillc-n  for  Ibe 
sanic  language  and  for  Ihc  same  processor.  This  was  nol  possible  even  for  Ibe  most 
popular  languages  viz.  Fortran  and  Cobol. 

Till'  goal  of  out  rx(ifrimr nt  is  to  comp, in;  the  insbuclion  mi;<  for  difli-renl 
applit  nlion  areas,  programs  and  execution  phases  in  the  pi  egiams  to  qu.'intif>'  the 
variance  caused  by  coicb  of  these  factors.  If  a  certain  hio.nl  applii  ation  an;a  (business, 
ri;al  time;)  is  found  to  Itave  a  nignifirantly  dillerent  in<liuclion  mix,  it  might  bc' 
worthwhile  to  design/  implement  a  spc;cial  processor  or  option  (or  Ib.il  ,irea.  t’ut  c'  c  n 

in  a  '.ingle*  apftlicalion  area,  all  tbe  programs  cannot  be-  expected  lo  r  'lubil  Ibe  same 

insiruclion  mix.  If  Ibe  variance  due  to  individual  progiams  is  found  lo  be  compai  .tbii.  lo 
Ibe  x-nriance  caused  by  application  areas,  it  will  not  be  possilxle  to  opIimi.TO  the 
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prncf '^'-or  for  pnitir.ul;ii'  arniis.  A  Urpe  vari.infe  r<f'..iillinf,  from  (lit'  diflt'rc'tit  i 
phase  will  dpfpat  any  aHi- ti\pl'>  of  oplimizinR  a  prncc't.or  (  or  inici  ococlt;  osecl  for 
iinpln i)ir;nt.ition)  ovon  (or  a  program. 

Tlio  stdti'itical  niotlel  prorrented  in  (ho  noxl  section  i;:  dn'MjpiecI  lo  address  prt;ci;.c!y 
Uiese  questions.  As  a  by-pioducl,  it  will  yield  the  instruction  mix  compost'd  of  over  JO 
million  instructions  selected  at  random  from  a  larpo  numljer  of  PDF'l  1  programs.  Vi'e 
now  describe  an  experiment  desip,ncd  to  study  the  ''anation  in  tlic  instruction  mi',< 
caused  by  the  throe  sources.  This  type  of  design  is  '.veil  l;nown  in  statistics  as  "Ne-itcd 
(hipi  arrhir.al)  desip,ti"  (  see  [AMi;i|;74],  [*)IMI:DC>7]  rbapU?i  10  ).  ConsidcM-  the  instruction 
mix  PS  a  vtrclor  of  the  frac  tional  utilizations  of  different  optode.s  at  i  any.etl  in  f.oruf; 
fixed  orilcr.  Tfic?  model  lu’in«  i/sed  is  as  follows:  ^ 

opcode!., ,  /i'  A.  -r  P,..,  *5,., 

j  (j>k  (jk)l 

(  I'l  »»  number  of  diffc-'rent  opcodes) 

(  a  !-  number  of  different  areas) 

(  p  ”  number  of  different  prosramr.  within  an  area) 

(  s  =  number  of  ciifferent  segments  within  a  program  within  an  area) 

and 

fi'  ■■■■■  overall  mean  fraction  for  the  i**^  opcode 
a!  -  effect  of  the  area  on  the  i**’  opcode 


V^•hcrv./ 

i  -  1.? . N 

j  ==  i.y, . a 

k  1,2, . p 

I  '■  1.2 . s 


^  Achutliy,  <’111  iiiiiH'iim!  o(  f^ti  nli.o  hi'  coni.xlcfod  *s  an  cramplr  cf  a  ni-i.trtl  (In-  rnnhic ’I)  mn'.l'.  '’  TKp 

lc•.•|^c'n  M.  that  «ur  li  np|i)'M<i(in  iiKliin  In  vc  nnl  brnn  thni;**  al  ran;lo'i  from  a  la'fti  niirl.'pi  nf  avaiVlili- 
npi'l'i  ilinii  .it'-aa  Thny  ara  rnally  ni^aa  llial  wo  wanf  •<>  inv«-t.ti« ;.1n  Thr  prorramr  'WPUiin  narS  aica  and 

Urn  tini,t''?nl3  wilhmnaili  pi'-rrain  aia  tin'i'cvia  randnr  oaru'lrr  from  a  larjn  yet  The  anslyyii.  dor!,  not  dviite 
otidi*  the  'fiKed'  me;lt',  enly  the  inleipretation  ef  the  reeiiilt'i  chroje*  V/hetenr  in  tlir  randcr''  n.; ■•.icI  we  ran 
c»:tenil  out  njuiitla  to  all  apiiliMlinn  ainnr.  on  n  PDP-11,  in  the  'fiM-tl'  in:  :l:T,  the  ennr'-.'!  leni.  ili.-i'r/n  anj 
rcrlrKled  only  In  tlii-r?  npnrific  nmar  llno/avwr,  rinre  the  cnnrrpi  af  an  npplii.ilieri  ninn  in  an:'  r.inc  r  wn 

have  tpanoH  nlmept  all  'ppliMtinn  rronr  on  Ihn  PDP-11  with  om  ft  arnar.,  wn  liavr.-  di-cdeci  lo  i{.notn  the 
(tii.tinr  linn  hot  n  'ranttoT*'  and  a  ‘f mad’  factor  nn-del. 


■I? 


f.ffcci  of  tlic  k**’  progriim  in  tho  arn.n  on  llio  i'*’  opfodo 

c  fft'cl  of  Ihc  I**’  f.cr.inrnt  in  Ihp  k**'  pio;vi>in  in  liir:  j**^  .lira 
on  the  opt  ode 

III.  quanfilier.  A,  P  and  S  ha'/e  (lio  followino  dir.tributicnc.; 
a!  is  t alien  from  UxIfOiCr^)  for  all  i 

is  taken  from  INfO.frp)  for  all  i,j 

S*  ,  is  fallen  from  HvJfO.ffr)  for  all  i.j.k 
(jli) 

v.'liere,  If'Jffl,/!!)  reprevent'-  a  normal  distribution  willi  mean  o:  and 
variarree  fi. 

Thi'  analysis  of  vari.ince  table  (  AMOVA)  for  path  opcode  is  ar.  follow 


Tnble  AA  The  ANOVA  T;.bln 


jouf  ce 

of  vari.ince 

Degrees  of 
freedom 

Typrclcd 

c.qifrii  0 

Application  areas 

a-1 

?  2 
cr^  .  .  s^:p^cr^ 

Ptogi  ams  v.rilliin  an  area 

a*{p-l) 

2  2 

5e  vnents  williin  programs 

a*p»{£-l) 

2 

Where, 

segments 
programs 
application  areas. 


<r  ■  Variance  due  to 
2 

<Tp  Variance  duo  to 
<T^  =  Variance  due  to 


f'lr  '  Ihr  tliirr  \',ii  i.kuc''.  .ni  t:  (oi  iin  0|;co--)c,  li  i'  po  .  .iLilc  to  tomf-.iic 

1 1 K  III  iiopiri-'  t  edfli  ollu’r  Id  dctf-riori'if'  wliirh  d(  tho-.i  ,ii  (>  m'mii (ic .ml , 

I  h'  .r.M.MKf AM!  i:  .lim.'ilf'ti  usmj;  llie  tur,  nv.lriK  I  icn  iiir<  d.it.i 

<)'''  (  rib(!cl  bolo  w. 

I  c  '  II',  (hifiiK;  - 

I  I 

Op  oric;  --  avc'i  iir^o  fr,itlioii  of  llic  i  t))i<  Oflc  over  <ill  !.i  'Hfrnt* 

(Of  llic  pro;;rr!in  iii  ilu:  aica, 

op  odr  a'/oi  Atie  (i  action  of  thr  i**^  opcode  ovci-  all  pro'^j  aiiir, 

'  "  for  the  j*"  arna. 

opf.odc:^^^  -  avorape  fraction  of  the  opcode  ov'er  all  ai  car.. 


Ill'  (oriiu.ila's  for  the  r.ijin';  of  c.cpiare^.  arc  a;;  foUriv."..: 


Table-  '1.5  The  Measured  fiunir.  of  Scpjcii  es 


riH'M  of  '.puarp'. 
bcM'.vc.eii"- 


(onniil.i 

for  sum  of  sqiir.rc 


apiilitation  areas 


sepmenfr-  in  prograivis 


i  i  p 

■'*P*  2L  (opcode  -  opcode 

lip'.'a  b’’*  *** 


i^rcv.iranin  within  areas  s*  ^  (  opcode?'  -  opcode;' 

Jrk:.p  f*'*' 


Zl  2-  ^  (  opcode'.,  opcode'.,  )'■ 

l^a  lsfe.p  lri<s  J*''!  I'-' 


Tlio  sums  of  srpiarcs  are  diviefed  by  the  correspondin;*  dc;';ree's  of  fic'cciciMi  ppvinr 

!hf!  i''-.  T!?!’  Um  I>,r.  "XO  rmijjinc  ft'M- 

'  '  ^ 

SLr;iri>.nt ..  v/ilhin  pi  o;;)  amr.  dircclly  (vstimates  cr  .  Ihr  other  two  mean  'qu.nrs  can  bo 

2  2 

used  to  cstiMialc  ct„  and  then  cr.. 

P  A 
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TcGtiiig  llie  Null  Myi.'iolliCGis 

rill'll!  till!  two  mill  hyprithc'.i"^  for  ovcry  opcocli?  that  c.m  bo  fo^locl  wilb  omi  c((!‘..i;',n 
of  I  he  cvpormif.  lit. 

I iypotlii- tiin  1:  Thnru  in  no  diflcroncQ  in  the  traction  ot  une  of  thr  opcode  fiiini  iipplic.ilion 
arm  to  application  «irfla,  that  in, 


Pp'potlicni!:  ?;  Them  in  no  cliffcronce  in  the  fraction  of  one  of  the  opcode  from  program  to 
program  within  a  given  application  area,  that  in, 


Tin;  ne  hypolhcr.rn  con  be  tenteef  lining  the  analynin  of  variance  procedure.  Connidcr 
the  ratio  (  nee  table  fl.'l)- 


n 


Mean  nqi/aro  between  areas 
Ktean  nqiiare  bel'A'cen  programs 


2 

<T^ 


2 

If  hypottienio  I  holds  {a ^  -  0),  the  ratio  PI  should  bc!  1.  However,  if  the  hy(>olhosi5 

2 

ir.  false  (<t^  >  0)  ,  FI  nlicuild  be  >  1.  Ktoreover,  giealer  the  clependencf*  of  the 

2 

fractional  use  of  the  opcocfi?  on  the  application  area,  larger  will  Ijc*  the  value  of  (t ^  ancf 


will  I.m;  thc>  II1'’  o(  Tl  h;|..,  I.ihiil.il'd  Hu-  tin  oi  rtir  <il  I  '  loint'  for 

liio  I  r.itio  ctintrii.xilion  for  vonoiir.  of  liridnin  ii'  Ihr  niiMiff.ilor  ;incl 

clrnniiMi>,ilor.  1  Inr  oij'-.orvcci  T  value  exceed'',  thi;.  Uibiil.'tccI  vain:!  in  1.'  of  tlir  ca.rc 
even  when  the  null  hypolhe'ir;  if;  true;.  The  lalnil.ileci  v.iIik;  r.  uf.ed  in  the  following 
way  :  If  llic  oh'.ervc:d  f  if.  iivkIi  lei.!.  Ilian  llic  ta)>ulatc'd  valu;,  tlien  the  null 
Iv/pcii lie  Ml.  hcilcli;.  f'n  the  other  hand,  if  the  ob'erve  d  value,  ii;  l.u  e/'i  than  the;  labiil.ilr  el 
value',  then  v;p  can  eay  with  99/  confide.'nce  that  the;  noil  hyiiolheeir.  ir.  falr.e  (  in  olhrr 
'won.l  ,  Ihi:  '.'ari.ince  is  r.iipiificant  at  the  1/  te'/el  >.  V^'lien  the  Obsnr'  cd  '.'.ilue  1;.  only 
e.lil’.htly  Ires  than  the  tabul.ated  value;  ,  we  cannot  aoike  a  stronn  statement  1  r'-^arding 
the:  null  hy'polhesir.. 

To  IcT.I  hypolliesis  ?,  v.'c  use  the  vari.mcc  ratio  - 

2  2 

Mn.:en  square  between  programs  er^  +  smct 

r?  _ 

Mean  square  between  sesments  <t^ 

Vv’bc'nover  Ihe  above,  medhod  sum’.cslr.  that  a  parlieul.ir  vai  i.uiec  is  '.ii'iufie  ant  at  the 
1  level,  it  is  interesting  to  tslitri.ale  the  confidence  limits  around  Hie  mror.ui  ercl  value 
of  the  vari.mcc  <  ttie  formulas  for  Ihe  upper  and  lower  bounds  are  give  n  on  page 
in  [fir')f;(Xi7]).  This  tells  us  how  much  variability  can  be  rvpecteef  in  tlie  nwof.iJiad 
varianr.e  if  we  were  to  do  the  whole  experimeent  again.  If  we  perform  tlic  f.'vperimr  nt 
wifli  more  applirafion  areas  or  with  more  programs  in  each  aiea,  these  confidence 


thf*  v/iti/tnf  *,’  rniid  ir-  ucuiiMy  donolod  liy  T,  V/r  wiH  vnH  M  («nd  tin-  ':ntfr'rpoit;'''i«  foi 

hypnlKotdi  F2 


G 


In  oi$r  rKpftfimj 'it.a*  r*  nud  fi»  r»  r>o  thi*  ptntii  for  fl  is  \J  funnl  for  f  P  it  1  HI 


liiiiil-'.  will  I)ul  llii‘  v^^|■|.^l1(e  ( /luM’d  by  Ihce  (ailon.  iii  <in  oiKorlc  v/ill  iinl  be  loo 

(.ill  (';  i  rnl.  Giiict;  llir  iiv  iruction  inr/  for  a  'jCflinroit  ir.  not  c oirif;0',cd  of  Mn.'  llcr 
me  .n.i  it  ciiiciilr,  wr  ( aiinol  determine  the  confidence  limit'',  around  tho  v  ai  i.incc  due-  to 
the  r.i’f.inr ntr.. 

^.5.  Dolnils  of  titi?  Experiment 

111  the:  arliial  dr'.iati  of  flu:  fr':|?criment  v/c  faced  Itn;  (ollowino  Iradf  off:  to  ;;atlii;r  an 
iiv..| t  n<  tinn  mi;<  -.cMitativc-:  of  the  inctruc lion  oxccution  of  the  whole  PDPl  I  family 

of  c CMTipulc; IS,  w(^  need  to  mear.ure  a  lar(>e  tuimbor  of  appli:  lion  arnar.  from  all  the 
an;  as  in  which  a  rOP)  I  has  l.icpn  used;  but  on  the  olhei  hand,  as  Ihe  niiml  er  of 
applii  alion  areas  f,oes  up,  the  time  required  to  ';alher  and  run  a  few  (  at  least  '3) 
rcprescMilativc  profirams  in  cacti  area  also  Roes  up.  Wc  hn'-c  Ihri  cfore  i  (’‘:.tricled  our 
experimenf  to  five;  areas  wliich  represent  most  of  the  instruction  rvrculions  on  the 
PDI'J  I  family  of  compufeis.  Vtfiliiin  each  area,  we  have  selerird  five-  rc'pri’si  nlali  <(* 
projv  <imr>  lhal  were  beinR  used  by  other  le.ors,  Ihal  is,  we  did  r.ol  shidy  syntbi  he 
brnc hnicrhs.  Vv'itbin  each  program,  we  measured  2'J  scgmi  nts  and  each  measuii'mcnt 
consisted  of  sampling  ^0000  instructions  at  random  and  constiucling  the  inslruc  lion 
mix  vector.  The  following  areas  and  programs  were  used  : 

1.  Scientific  t'orlfan  henchmarhs:  B  user  benchmarks 

2.  lousiness  (^obol  lienc  bmarl-.s:  4  user  benchmarks,  plus  1  sort 

3.  Operating  syedems:  RSX I  f  -M,  R5XM  J -D,  RTl  J ,  RSTfi,  I  ly-dra 

4.  Systems  pcograms;  Forte  an  and  Cobol  compilers,  BASIC  interpreter,  rnacio 
asiiomblnr  and  Ifie  linker. 

5.  Device  oriented  systems:  graphics  terminal,  fiont  end  processor.  Xerographic 


<  onl I  (illr I ,  |>i  oc f  ■  t.oc  0  on  C.iomp  (  iico'/iiy  lo.td  'cl  .villi  1/(1  dcv  u  c' ),  CK'lt 
(  font  rolling,  n  collnclion  of  1.S1-)  1  pi  ot(  f  fors). 

Girrc  1lie  oprriifin;';  ;,y'.leiiiri  and  Itio  real  linie  dcviro  oiiiTnlod  i.yflr'in;'.  ’.vfif’  to  1,'e 
r.todii.d,  it  only  loot  tli.il  \v.i'.  applicabin  (or  .ill  llic  arrar.  'vai  a  h.ii  dwai  n  irioniloi  .  Thir, 
Irl  ir  iiir,  a;,iii  (;  thr  lion  ini;<  v.'ilhoiit  pnrliirbin;'.  the  inr  ar.urcd  iii  in  any  v.'..v. 

Uncaii'.f;  of  Innit.ilionn  of  the  hardwarn  niomlor  uteef  (K.mon),  wr  foulif  not  iC'iord 
r\‘(-.ry  iii^lriir  tion  talaiif’  place  on  Iho  mcarairecl  pi  occfsor.  f.i.u  h  a  trace  of  iiv.truc  tion'. 

I  fcpiii  c;.  a  very  liipli  liandv/idlh  output  dovire  for  rece  iving,  tlie  d.ila  fioiii  the 

liardv.'.irr?  monitor.  We  were  Uicrnforn  restricted  to  f.,impliiiy,  iiv.lr  iic  lion;,  at  i  andoiii. 
Ttierc  ir.  artiiatly  no  simple  way  to  specify  selection  of  instr uc tions  at  random  in  K.mon. 
K.mon  (.in  Ije  set  up  to  select  every  n**’  instruction  occuiinj  on  tlir  IMicst  unibus, 
v/lierc  -1.  Unforlunaloly  the  value  of  n  h.is  to  |•>e  specified  before  llio 

cx|.icrimcnt  begins  .ind  il  tlien  remains  constant  Ihrourjiout  thr:  experiment.  \V(>  usually 
chose  .1  prime  number  (or  n  (t'ypir.-illy  3)  or  \?1)  so  Ih.il  thr  inohlem  o1  K.mon 
sync lironi;'inr:  with  small  loops  on  the  P.ho‘;t  was  .ivoicied.  ’I here  is  still  the  pO' silulity 
of  a  loop  of  some  exact  rnulliple  of  n  instructions,  l>ul  we  ftll  that  this  prohlom  is  not 
si(piifirant  since  in  our  experiments  such  loops  were  hroken  up  due  to  the  following 
evc-mls; 

occurrrrnce  of  a  clock  or  a  device  interrupt 

a  pause  in  llie  rj.ita  gathering  due  to  ov'erflowing  I'.rncn’s  internal  dala  galhering 
huffc^rn.  Tfic  overflows  occur  because  of  the  slow  output  link. 

II  is  also  possil.ilc!  to  sample  thci  instructions  such  that  the  first  inslruclion  cncuiing 
after  every  n  microseconds  is  sciected  (or  analyc.in.  llilr.  melhud  is  hoveever,  not 


^9 


Miitfll'lf  for  nu. lU.t.it  iiv;  llic;  in;.lru<  lion  mix  sin<e  il  JKlitally  ;;iv(:r.  the  (ii'.lril'iition  of  the? 
tiinr  v|>»>nl  in  oyr  r  nlit?;;  Ihc  v'arioiir.  inr;frutlion?.  inr.t''.?f!  of  Ihr  fic'qur  ncy  of  llicir 
(■ -.f'e ulion.  Ginco  v/r  eooid  nol  ol>l;iin  a  n;(Ord  of  foncccolivo  inr.lr uc lion:.,  o  r.hiriy  of 
ft txiiicnlly  (;xcciiled  int.lriiclion  i-equcncct.  could  nol  be  pcufctiincd.  fn  ciiciptcr 
C>  we  dcccril'.'i!  n  '..cp.nrutc  experiment  to  eludy  llie  iitr.lrut  lion  '..cc|iicnce'... 

[Tven  when  eaitiplin?;  inedruclionr;  at  random,  il  wr.fi  nol  por.eil'.iln  to  record  the 
occiii  i  t-nre  of  every  sampled  in!.iruction  because*  of  tlie  low  ;,|,''.  ed  of  out  oulrmt 
rlevire  (  a  300  b.nicl  lirit;  to  the  PlDI'-)0).  We  had  to  perlorm  '•.ome  riala  ( empi  nseifn  iii 
the  Mipcnvisory  tompuler  before  storin?;  the  data  for  pout- procc'-r.iitf;.  Pacb  e,iin|,li.d 
inuftiicfion  is  <irt‘d  to  update  the  appropriate  counteru  in  Die  main  me/nory  of  ll'ie 
oupoi  v'isory  computer.  Afor  ?00(>0  inulrticlionfi  are  procf'.i.ecl  (  i.i;.  one  ■'.p^nienf ),  the 
rounit'r  vali.'o.u  art?  eiot  ed  on  some  output  dr-vire  for  latc'r  pi  occ  ■'•cine.  The  counterr. 
art?  rnnintainocl  for  the  followint;;  each  of  the  POP)  1  opcodes,  S  modes  and  o  rtigir.tcrr. 
for  oin^te  operand  inr.lruc lions,  S  modes  and  S  rcg/r:ters  each  for  the  r.aiirce  and  the 
destination  operand  for  doubit?  operand  ^  instructions.  I  his  data  rompi  t?s'.ion  reduces 
the  amount  of  data  that  has  to  be  transmitted  to  the  output  device  for  poct- 
procc  r.r.ina,  lull  il  prevents  us  from  sfutfyinn  Ihe  occiitience  of  t  ros'  -producl-.  of 
aridrt  ssinj'  rnodrs  and  registerr.  or  of  source  and  deslinalion  modes.  This  study 
tluireforn  cannot  answer  questions  like  how  many  lime;,  either  the  source  cir  the 
dnsdinalion  mode  i.s  zero  for  the  WV  instruction? 

K<or.t  of  cnii  measurements  were  performed  on  Ihe  PDP  1  1  models  20  and  dO.  These 


^  The  -imt.'-  oppt, 11.(1  iiK.IrmliiK.i.  nil  ClUini).  COMtliD.  IN'CtU),  DtCdi),  NCIKli),  TST(li),  k'ORit;).  T’Ol  (P), 
AMJtfli,  Ar.|(i;i,  riV/Alt.  AUCdl).  tdJCdt).  t:xi  Ttn*  dmiMi  Iiiilivulu.m.  nm  WOVdli.  CMI'lK). 

At)i),!',iiu.  iin<»).  nic(fi),  niJitP) 
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hciHli.'ls  (.tt>  luit  lifi'/'.:  ;i  idiiliirair olc'd  (loalins  point  or  Oiif.iiiO';*:.  insliiitlion  ‘icI.  1(  v;e 
hriii  usf?<i  IIk;  ncv/f-r  niodclr.,  wc  would  h«ivo  (rrloinly  discovcri'c)  inoir  of  the 

(lOtiliiVj;  point  ond  Inir.inrrr.r.  r.ppcific  ini.lruclionr.  in  llir  cicicnlifu  and  hur.inc'nr. 
opplii  v'ltion  firpo!'..  Ii^orro'.'or,  llic  Forlriin  Cobol  coinpilci  r.  hav'C-  Ivc  c-n  iiiipi  n'.'rHl 
r.incr  tho  tnc<T.'.iiiniiir!nt-;  wore  performed  aiK)  llic  new  rompilerr.  an?  <'xpect'''d  to  ur-n 
the  iiistr  iittion  f-et  more  wuncly. 

^.6.  Rciriulh'  of  tho  Inr-truction  Mix  lExperirnent 

Till-  of  our  experiment  are  preeenfed  iri  Tal'.le  Only  IhO'.e  oprorlev  and 

which  r.hciw  c-infuficanl  uf.e  (morn  than  0.0)  percent)  arc  included  in 
the  table.  The  complele-  iiu.lruclion  mix  ir,  givcMi  in  appendix  B.  For  Ihe  variance  due  lo 
npfdii  olion  area:;  and  (jro(;ramr.,  the  90/f  confidence  limitr.  aroiiiid  llie  variaiice  m  e 
r,iv<::n  in  cctuare  bracKc'l<j.  If  the  variance  is  larger  than  10(),  lhf>  conficir.'ncc  limitr.  are 
omitled.  Tlie  variance  in  reported  ar.  0  if  if  is  less  than  0.00).  Some  vak.x.'S  of  the 

v.miantc  are  negative  and  these  valucjo  and  their  confidence  lihiits  are  not  f.ivcoi.  A 

nc.'p.ato'cc  variance  meanj;  that  the  variation  ir.  less  than  wliat  woiilcf  he  expc'cled  if  tlie 
opcode  fractional  us.arje  value;?,  were  cirawn  fiorn  a  single  norm.d  clisti  ilMilion.  Vi'c  can 
interpiet  llie  riej;ativ(-;  variance  lo  mean  that  tlie  variance  is  small.  Ihc  total  vari.mce 

(or  an  cipcode  is  alv/ayr;  positive-;.  Our  particular  modt;!  attempts  to  split  this  total 

vari.ince  into  three’  parts.  It  just  so  liappenr.  that  the  particular  v.aluc;s  o(  data  collectcct 
sometimes  lead  lo  n  ne^’.ativc:  value;  for  one  of  these  three  parts.  Tlie  nbservc'd  FI  and 
F2  value,":-  are  also  r,ive;n  (or  each  opcode  and  Ihe  addressing  modes  and  resislers.  The 
r.ip,fiifirvant  F  v.nluos  are  flartscd  with  a  Note  that  all  the  v.sri.inces  due  lo  pi  ovi  ams 
are  eiinnifir.int  but  quilc;  a  few  variances  due  lo  application  areas  are  not.  11  is  also 


inlcrt:’  Imp.  lo  obt.ri  'T  how  Iho  iiu.lruc  tionr.  .mcl  Ihr:  .'idfli  (?' -.ipp,  imnic:  ,ii  (>  hemp,  nr.fci 
l>y  )>rop.i  .iivt'.  ill  ('«i(li  .'njplir.alion  aii»a.  I  al>li!  ^.7  prc'/f  nl:.  thr  instruction  inoi  liy 
applu  at  inn  arr  a. 


cj  u  u  a. 


T  iil>lc  4/3 

The  Iitstniclion  Mi>:  and  ilr. 

Vari.inre 

1  ' 

Miiiiiboi  of  .'ipplii  .ilion  arc  or;: 

5 

Nmril>pr  of  firo 

■It  arrir.  per  .tppl  .irtra:  5 

I'JiHiil; 

iM'  of  r.cs 

inent".  per  pr03,ram;  7A 

t'lniiibnr  of  inr.lruclionr.  per  r 

cjiiTicnt:  20000 

1  oUli 

ouml.mr  of  inr.lriK  lion*. 

';ai>ipU;d;  12  million 

nn- 

Uvf?r  -n  1 

1  l.lvfjrol  1 

vor i once 

vnr i ence 

vni-  i  once 

61 

r-2 

niron 

ritnnd.nrd 

due  fo 

due  to 

flup  tn 

j  1 

rlov  i  <■)  1  i  on 

npp  1  ic-a- 
tinn  oreo 

pforiftini:' 

'.■■o^Iiiirrntt; 

1 1 

1  ; 

V 

31.305  2.170 

6(0.22] 

75(47.82) 

S.:.'58 

1 

2:;o 

t: 

':«.537  1.G75 

1]  [2.13] 

12  (8. 14) 
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^.7.  Conclusions 

III  Dll',  c  h^iplr-i  'jjc  ur.'ii  cl  o  m.ijor  evpcrjivirnl  f.l  to  .'irldi  (!■:  ;.  Ilic  piirr  lion 

of  till'  VMi  i.ihility  of  tlio  itu.truclion  mix.  Tho  mc.ir.ui  (•ivi^nt''.  inclic.-ito  Ihiil  d  '.tdti'..tic.ill'/ 
'HiMif  ir.int  \',)ii,itirm  in  tlir  iiv'.lriif  lion  mix  ir.  caii'-ocl  bcilh  i.iy  tbr  nfplic.ilion  died  diHl 
lilt;  II  itli'/Kfudl  propr.imr.  in  a  p.i'-'tin  aroa,  but  not  by  the  dil  lc  i  i;nl  pliar.i  t.  of  c  -it  c  oiion  in 
the  t.  iiMt.  pr(;-^i  nm.  Idornovei',  diffiircnt  inelnif tiont.  r\l',il>il  dilfc  rmt  l>t’li.'i''ior.  Tin:  inO;.! 
hravily  ir.t'i'l  int.l  rue  t  ion  (  MOV  )  r,  affected  more  by  the  indi'.idu.il  pi  (!;M  amr.  tlian  tlie 
applii  alion  ai  ra;;.  In  other  word:.,  we  cannot  '.peed  up  iiie  orcution  of  the  N'tOiV 
inr.ir  lit  tion  and  Iioik'  to  at  hif;;'.'e  tlic  '.amf;  level  of  r-pecdup  for  all  prr,;‘i  amt,  e  ven  in  a 
t-pt'f.iiir  applic .ilion  area,  ^iiinilar  condo'.ionr.  can  l.ie  dra-r/n  (cn  citlu  r  int  li  uc  tioii;.  ai; 
well.  It  is  Iheri'forf?  not  proper  to  attempt  to  over- optimi.Te  a  pioccisscr  fcir  a 
part i<  i.ilar  applieation  area. 

Tin-'  O'  /'rail  rnr-an'  and  Ibo  slandarri  dovialionr.  feporlo'i  for  individ’ial  iivliuclions 
are  import ani  in  their  O'.vn  ri(>ht.  The  overall  slandard  devi.ition  and  the  mdi'.udual 
variancer  are  to  be  used  as  follows: 

Suppose  '.ve  maKe  another  mear.urerrirnt  of  the  int  triittion  mix  iisiny,  A1 
arras,  f’l  programs  in  each  of  the  areas  and  SI  segments  in  each 
(rregram.  Note  Hint  any  of  the  quantities  At,  PI.  or  S)  tan  be  equal  to  1. 

Vi'e  then  ealcolate  the  pci  rentage  of  usage  of  say  the  MOV  instruction  to 
br  ’.fl.  VJc  can  then  sa'/  that  Ml  as  an  estimate  of  the  overall  pcricntane 
of  I  isage  of  the  MOV  inslmclion  for  all  the  executions  on  a  PDF- 1  1  lias  a 
'/ar  ance  equal  lo  - 


rrj^/AI  H  (r|^/(Alt.l^))  •<  o-^/(AltP)  till ) 

Vk*lK'ri>,  tlic  '/ari.intG^.  nip  lh(K,o  giv'cn  iii  Inbli;  ^.G  for  Ihe  b^iOV 
in!.truclion. 

riPi.iiT  ^.1  cfi?:plnyf;  flip  POf’-]  I  in'.truf.tion  (rcqticmy  (Ji'.lt  ibdlion  fupnnni  pr)  in  our 
fitiirly.  It  if;  intr  to  ob'jPrv'c;  that  only  JO  iri^triK  lioii';  nrrounl  for  dlront  70 

porctnt  of  Iho  int.lriK lionr;  cxnc iiled  on  a  POP-11.  It  can  bt;  tccn  from  Tctl:lr:  A. 6  th.il 
iviftny  of  tiu)  ini.l rue  lion;,  arp  frt  icloni  u‘..cd.  Similar  i(*miI1';.  Iia'-c  bcr  n  Kiporlod  by  oIIhm- 
rp';.car(  hory  r.ti.i(fyin«  the  inrdruttion  mix.  Our  mtar.uMriunnlr.  iiidical(r  that  Ibc 
uddi  I  mndp  5  (  aulo-doci  pniGnt  dofcrrnd  )  in  almcvt  nr  -c  r  unod  and  a  oood  ( ar.c 

can  Lio  made  for  it',  cliininnlinn.  Since  out  c.tudy  ir»vol>-G':.  naiaplir.^  mere  Iban  10  tnilllon 
iiir.l ruction?  from  ?5  independent  pro^ramr.,  wp  expect  Hint  tlip  true  naluri*  of  Hie 
in'itruclion  mix  for  the  POP-ll  bar.  been  caplurod.  Our  ritcultn  form  a  data  banc  which 
han  iinporlanl  applirationn  in  the  denign,  impipmrnlation  and  emulation  of  POP-1  1  and 
nimilar  procen.norn. 

2 

Thct  mcanurpmentn  reveal  nomc  anomalies.  It  can  be  seen  that  the  cr^  for  tlic  JIdP 
instruction  in  extremely  liigh.  A  look  at  tlio  rav/  data  dinplayeci  in  the  instruction  mix 
by  application  arna  table  (table  '1.7)  indicates  that  the  application  area  consisting  of 
scientific  rorlran  benchmarks  shows  '/cry  high  use  of  the  .ll/P  instruction  compared  to 
any  t?lher  application  arna.  This  Ip  acts  to  the  large  value  of  Ihe  observed  variatHe 
caused  l?y  application  areas  tor  this  instruction.  Tl>c  troul>lt'  lies  in  the  parlKular 


Cumulative  percentage 
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Instruction  Mix  Distribution 
Figure  4.1  Distribution  of  Instruction  Usage 
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(orltiin  f. otnpiltir*'  wliich  the  JMf’  iin.l* ix  lion  iii'.tc.x)  of  ilic*  brancli 

iiv  li  ut  tion'i. 

11  If;  iMti'.rc':  linj;  lo  look  at  the  whole  c>:p(^riment  in  the  li(?hl  of  the  worl-.loiic) 
c  liarai:  tfriiNilion  problem,  fotuili'/cly  the  dHlerenl  applir  alion  arrar;  reprerscnl  (liflc'fi;nf 
worl;loaf)s  r.ince  it  is  clear  Ihat  each  of  these  areas  is  doinj',  a  differenf  Kind  of  'worl;'. 
The  I  oriran  programs  •are  maiiiptilating  t^urnbers  for  the  ptit  (>ose  of  solving  equations, 
the  o)'(;r.iling  syslemr.  are  performing  the  processor  anti  mriiioiy  stht  diilirif.  fimclitint. 
wlic;ri-;ts  the  real  lime  .syslem.s  are  respernding  to  the  cvenls  happonning  ii^  their 
crnvii  onments.  Oiir  evperiment  is  an  attempt  lo  charactori;:s‘  Ihese  inti.iilivc:l’/  diflerent 
worUload'i  in  terms  of  their  instruction  mixes.  It  liin^s  out  that  a  ineaningfiil 
<  harac  teri;talion  at  such  a  low  tovc^l  is  riot  possible  cine  lo  the  variation  in  the 
programs  belonging  to  these  areas.  This  negative  result  t:hould  not  be  inIcrp.nTcd  as 
snyiiv;',  that  a  characlorijtalton  at  a  higher  level  is  not  possible;  in  fact,  future  researrh 
should  concenlratc  on  Ihe.  next  higher  level  of  atomic  'work"  r.g.  in  forms  of 
manipulations  of  higher  level  data  structures  like  voctois,  lists,  process  control  blocks, 
sliilV-'.s  ,inci  i;o  on  insIcMcl  of  inlegern,  real!!,  bits  and  wonis  as  wci:.  done  in  this  sturly. 


^  We  iiiifvt  Pi:'  I  (?K’lk’AK-IV  '.iiirpibr  wliiih  tliv  mi”.'  bi'Mi  I'rePr'-d  liy  KOUIWAW-  )V  « 
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5.  Muliiprocessor  Coninintion  for  Shrircd  Dijta 

5.1.  IntrofJi/clion 

Thin  ch.iptor  i.tiiriins  contmtion  (or  iihiirnd  d.ita  objcrl'.  lo  a  inulli-pi occr.c.or  c  yc  tc  ivi. 
A  common  problem  in  a  taulb-proccsror  j.ystem  ir.  the  contention  amonc,  procef.cor.'.  for 
sharnd  ro^oiitc.rc,.  (hir,  contention  c.an  occur  at  variour,  lovclc.  C.mmp  li.e.  up  to  16 
PO(^- 1  J  whitli  can  incioprnclrMitly  access  16  iviomory  |>t>rlc  tbroiif.n  a  crorc.-|'C)inl 
switch.  The  lowest  level  contention  therefore  occurs  at  lire*  cross-poiiit  switch.  K  t  wo 
or  more  processor;;  try  to  access  tire  same  memory  /><>»)  at  the  same  time,  all  bul  oire 
of  Ibem  have  to  wail.  This  problem  has  bc*en  sluclicci  earlier  by  Sli  orl'.cr|6T|rr70], 
Bbandarbar  (BHAM73]  ,  M:Cn!clie[MCCR73]  and  by  B;'sl’.cd  and  Smilb  rRASI<76].  Tullirr 
ffULl  76]  applied  Ihcrm  mocfolr.  (o  specih'c  har(iw,are  confiftiir ations  of  C.mmp  .and 
showed  that  rneniory  inlerforente  does  not  cause  severe  def/adation,  i.e.  less  than  10 
percent.  On  a  higher  level,  fhoro  is  contention  for  shared  d.ala.  The  .shared  data  can 
bo  a  few  bytcjs  in  a  system  table  or  a  barge  data  structure  liKe  a  linKcd  list  or  a  file. 
On  a  r.lill  bigher  levc^i,  there  in  contention  for  devices  like  the  line  prinlcr  or  dir.lvs  and 
common  softwaro  processes  for  metnory  management  or  user  mevsag.e  handling.  In 
this  chapter  wc:  irrvestigate  the  contention  (or  the  shared  data  structures  in  Hydra. 

Tlie  fjrobfetn  of  contention  far  shared  data  is  s-omriwhol  morn  difficult  for  a 
muliiprocessor  ttian  for  a  uniprocessor.  In  a  uniprocessor,  system  integrity  can  l.^c 
maintaincicl  by  simple  teebniquos  like  blocking  .ill  inter  rupls  wliile  accessing  cr  itical 
system  table',  and  Ijy  careful  coding  of  fhc  inlerrupl  i onlines.  In  a  mi.illipi  occ  ssor 
howc'/cr,  the  scheduling  and  coordination  of  the  individual  proccssorr.  lo  arhirvc 
parallel  operation  is  a  significant  problem.  One  approach  is  to  have  a  common  shaird 
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whirh  (()nl;iin<.  all  lltp  inform.•^tion  noccs:.ary  for  a  procc  .r.or  lo  ili. 

';f  I if*« Inline  (fcr  isioo'.,.  fliir,  jg  flip  apitioach  taken  by  Kycfra-  tlir  kernel  (if  llie  operating 
s;y;;lem  for  C.inmp.  Vi/hilr;  one  proce*:.r.or  examining  or  updating  this  dalaba'a:,  all 
olher*:  mM>;l  be  proliibiled  from  nccer-sing  or  mcdifying  il.  There  aie  oilier  iiliared  (fata 
object',  an  well  which  are  also  required  lo  be  arcennccl  by  only  one  procennor  al  a 
liint.  riic  rricchaninm  foi  :.uch  mulcial  exclunion  in  C.irnrip/Hydra  v.y'.lcm  in  a  Ti:ck‘. 

A  Hydra  lode  nhould  not  be  ronfuned  with  a  nciTiaphorn  nincc.  Ihe  former  in  at  a  mort* 
priinilivc;  lr\'(:l  than  th(!  latirr.  In  llydra,  the  loda;  arc  Uf.rd  to  '.ynchroni:^  .i((crn  lo 
nmall  hul  often  f((:f((icntly  acrenced  shared  dafa  objecfc.  The  'lock*  and  'unloci;' 
op('rationri  are  nimilar  fo  the  P  an<f  V  operationn  on  nrmaphorrn  except  that  when  a 
procc- ‘-nor  block.n  while  Irying  to  net  a  lock,  llie  procenn  running  on  tlie  procennor  in 
not  conto'.f -nv/apiH'd  off  Ihe  procensor.  Rather,  the  procennor  in  nimply  pul  in  Ihe  v/ail 
ntato  (  tim  procennor  in  naid  to  be  blocked)  until  Ihe  reccipl  of  an  inlor-pi oemnor 
signal  {  interrupt  )  notifying  il  that  Ihe  lock  bar.  beren  reset. 

Tlic  lock  and  unlock  operationn  are  used  to  implement  a  more  general  and 

•  • 

nophislif. aled  synctironization  mechanism  and  some  message  r.yntemn.  Tlir-  funclamr'ntal 
qiirr.lion  wc-  r.oiighi  to  invcntip.ata  in  thin  study  war.  Iiow  mu;h  pinrcr.nor  drgrarfabnn  in 
due  lo  contention  for  shared  data  objects  in  Hydra  synchronized  by  the  lock/  unlock 
operations.  The  amount  of  degradation  will  bet  affected  by  Hie  number  of  active 
processors,  lengths  of  critical  sections  (i.e.  the  instructions  exoculc'd  between  a 
loc Ic/'inlock  pair),  frequency  of  lock  execution  and  the  distribution  of  lock/unlock 
operations  aciujss  the  different  shari^cl  data  objects. 
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5.JJ.  Rc'viow  of  Pn^viouB  Work 

TIv'  r.oflv/arf?  lockout  probleirt  war,  rnoclcillecl  as  early  as  in  l^CiH  liy  Maclnitk 
(MAI!lMlii’T|.  Ik:  considered  a  siinpln  r.yslcm  (onsir.linj*  of  a  sinoir  crilical  '.(''•lion  ■•mcl 
calrolated  (hn  inean  nttfiihcr  of  hloclicci  processorr.  as  a  (>iiiclion  of  llu:  lolal  I'oiabrr  of 
processors  in  the  system.  MrCreclic-  (MCCR73]  (.••^Icocii'cl  lliis  riiOclc:l  to  inclode  no 
nrbiltar’/  nuinbe»r  of  crilic.sl  sections  in  tandem.  Tlic  resiill;,  shoveed  Ihal  it  is 
advarita*’.oaiis  to  fiave  two  sm.nllur  critical  sections  instead  of  a  siiij;!o  criliral  section 
\vliith  ckiC'S  the  workv  of  the  (wo  smaller  ones,.  The  desipners  of  Hydra  hav'e  tlierr'forn 
chosf  n  to  liave  many  sm.-dl  critical  sections  in  Hycira. 

■|o  llic‘  be  st  of  one  knowl(->dpe,  evperimr  ntal  verification  of  any  of  these  models  has 
not  been  attemploci  co  far.  the  software  lockool  problem  cannot  be  invp.stipakid  iisinj; 
Koftv/aK!  monitors  doc.  to  tlip  excessive!  pcrturbiilion  invol"C'd.  The  fart  Ihal  a  pov/e.rfol 
hatd'ware  monitor  is  necessary  for  this  study  is  probably  the  reason  why  this 
important  problem  could  not  be*  examined  earlier. 

5.3.  Description  of  the  Experimental  Setup 

Tbo  K.mon  was;  used  in  the  tr.icin2  mode  to  rccorrl  the  occurr(>nrcs  of  all  events 
related  to  locks.  The  data  was  post-processed  to  reconsliiict  the  operations  on  the 
varic:,  locks  and  the  blocl'.ins  experienced  by  the  host  processor. 

A  lock  is  compo.sed  of  three  fields:  lock  count,  sublncl.  and  mask.  The  lock  count  is 
initiall'/  I  indicalinj;  that  the  lock  is  free.  Vdhen  the  shared  object  hecomes  locked,  the 
lock  I  ount  i'.i  N,  where  N  (N  >  0)  is  the*  number  ol  processor:,  waihi"'',  (or  the  object. 
The  ! nblocks  are  used  to  ensuio  that  onb/  one  of  (he  wailiny,  processors  rz-t  access  to 


the  'ihoKui  object  when  it  becomrss  free.  The  mask  field  i:.  used  to  indicate  which 
procc  score  arc  blocked  on  the  lock.  The  cchemalit  code  ctciiiRnccj':.  for  lock  and  unlock 


are  (Jivon  below; 

LOCK:  decrement  lock  count. 

exit  if  equal  to  0. 

tSI.OCK:  mark  ;,elf  as  blocked. 

Turn  off  all  inlerrupts 
ffyr.cpt  the  unblock 
signal, 
wait. 

try  for  sublock. 

If  fails  GO  fo  BLOCK. 
UNBLOCK:  exit. 


UNLOCK:  inc  I  ement  lock  count. 

cjxil  if  grcealcr  Ihnn  0. 
jiuliali:'£‘  sublork. 

Send  unblock  intorrupl  to  all 
processor;,  bloci.ed  on  this  lock 
Cxil. 


Tho  monitor  was  set  up  to  detect  events  as  follows: 

(1)  When  a  Mock*  i;;  atfempli-'d.  oblain  address  of  the*  lock  and  lime  sl.imp. 


(2)  When  an  'unlock*  take;.;  place,  obtain  address  of  the  lock  and  time  stamp. 

(3)  When  an  'unblock*  lakes  place,  obtain  the  time  stamp. 


The  first  two  events  give  the  critical  section  time  when  the  attempt  for  the  loci;  is 
successful.  Whenevc^r  cvc-:nt  3  happens,  it  is  alv/ays  alh-r  event  1.  It  indiralc r.  that 
tho  attempt  for  lock  war.  unsuccessful  and  lhal  the  processor  war.  'blocked*.  The 
address  of  tlie  lock  is  obtained  from  the  previous  event  1.  The  blocked  lime  ir. 
determined  from  the  time  stamps  of  event  I  and  event  3. 

We  used  three  benchmarks  to  generate  loads  on  the  system.  Each  experiment 
consisted  of  running  one  bcinchmark  by  ilself  and  collcchnp  Ihc  output  of  the  h.iidwaro 
monitor  for  post -processing.  All  the  benchmarks  create  about  J6  cliflercnl  processes, 
c.ich  executing  the  same  program.  The  processes  do  dilterenl  amounts  of  compulation, 
synctironize  with  each  other  and  repeat.  tSenchnmrk  1  and  2  are  two  versions  of  a 
parallel  program  to  find  the  roots  of  a  transcendental  equation  [8].  They  use  two 
different  typcjs  of  semaphores  for  synchronization  among  thcmsel'-c-;';.  Ihc  third 


btMiclimaiK  ir;  a  r.ynth(ilir.  program  v.'hich  nyr'Ciilf'S  varioiir.  I.rincl  tall'',  iiilermitctcl  willi 
nmall  amr.'iiotfi  of  iir.cr  lovol  (ompiiliog.  A  (oiiilli  mr.ir.iircniriot  v/a'.  iii.icft!  rii.irir'g  the 
iiMial  ti'.cr  hour;;  to  give  the  frequency  of  iitage  of  variuii!.  locl.r.  for  the  curicnt 
typir  .rl  ii'>er  load.  At  the  present  time,  C.mmp  is  not  heavily  loaded  during  gci'>crnl  user 
sessitjns  and  tnear.ureriients  of  C.mmp  under  near  saturation  conditions  will  have  to 
wail  until  general  usage  of  C.mmp  incroasrs. 

5.4.  Locks  in  C.mmp/llycli  a  environment 

The  kernel  of  the  operating  system  for  C.mmp  is  l.nown  as  Hydra.  It  h.'is  been 
described  extensively  in  [WlJLFVC].  We  briefly  cummariire  here  the  pertirtent 
information.  Hydra  solves  the  processor  schedulii^g  and  coordination  probli-m  by 
maintaining  global  data  structures  containing  inform.stion  regarding  the  status  of 
processors  and  feasible  processes.  Locks  have  to  bc-i  associ.?ted  with  the  objects  in  this 
database.  Apart  from  these  locks,  there  arc  locks  on  other  shared  objects.  C.<amplns 
of  an  object  are  a  page,  a  semaphore,  a  process  or  a  file.  Every  object  has  one  or 
morn  locks  cissocialed  with  il  lor  , recessing  diflercnl  pari:,  of  Ihr  object.  Gincr  there 
are  thousands  of  objects  in  Hycfra,  there  arc  also  thousands  of  locks. 

htot  alt  locks  are  however,  heavily  used.  In  our  experiments,  we  observed  of  the 
order  of  5  frequently  used  locks  and  a  number  of  lightly  used  lodes.  One  of  the  most 
frequently  used  locks  is  a  'feasible  queue’  lock.  Processes  which  are  ready  lo  run  are 
placed  in  one  of  eight  feasible  queues  wailing  for  a  processor  to  bocome  free. 
Currently  only  two  of  the  eight  queues  are  bedng  used,  the  fii!:l  one  for  inpular 
processes  and  the  eighth  one  for  high  priority  processes.  Another  frequently'  used  lock 
is  on  the  'processor  list’  which  is  a  list  of  all  16  processors  containing  information 


68 


rtboul  their  Fvery  lime  a  processor  becomes  free,  il  fjoes  IIiiourIi  IIic'  feasiljit; 

qiiem.’s  and  Ihc  processor  list  to  determine  which  prucc  sor  should  v/orK  on  the  next 
procc  ss.  The  nc'xt  important  loci',  is  on  the  free  core  list.  It  is  Ihc  loch  on  a  list  of  free 
physir..al  pap,e  addresses.  This  list  is  used  when  pap/.s  are  to  be  swapped  out  or 
brought  in.  A  similar  loch  for  storaffc  inside  the  kernel  is  called  fhe  'kernel  sloiap.e 
lock\  The  lock  on  ‘slop  in.iilbox’  is  used  by  the  policy  module  ^  to  commi.inic.ate  v/ilh 
the  krrnc;l.  This  lock  is  accessed  every  lime  a  procc-ss  has  to  be  stalled  or  stopped. 
The  'KMF’S  lock'  is  the  lock  on  a  free  core  list  which  is  used  by  the  scheduler  to 
allocate  and  deallocate  fixed  size  blocks  for  process  inforivi.ilion.  It  is  used  every  lime 
a  process  is  started  or  stopped  or  when  a  message  is  sent  to  a  process. 

There  is  another  intereslint;  lock  called  'lock  on  a  pap/?’.  Fvery  d.ita  ptip.e  (  the  size 
of  a  nape  is  8I<  bytes  in  C.ntmp)  in  Hydra  has  a  lock  assoc ialcd  wilh  the  wliolc  fiap.e 
and  this  lock  is  alv/ays  at  a  fixed  offset  from  Ihc  bopinninf,  of  a  pcip,e.  Since  there  arc 
GO  m.'rny  pap,cs  in  the  nyslcnri,  they  have  to  be  overlayed  through  a  relocation  register. 
Due  to  the  well-known  deficiencies  of  a  hardware  monilor  using  inform.9tion  internal  to 
an  operating  system  {or  other  software  systems),  it  is  not  possible  to  pinpoint  the 
particular  page  that  has  been  locked  erven  wlien  the  lock  is  cleirrteci  at  the  proper 
offset  from  the  beginning  of  Ihc  page.  All  one  can  say  is  that  sotae  data  page  has 
been  locked.  The  result  is  that  even  though  locks  iincfer  the  common  lieacfing  of  'lock 
on  a  pap.e'  are  accessed  a  large  number  of  limes,  the  number  of  timer,  a  processor  h.rs 
tc)  wail  on  such  a  lock  is  much  less  than  if  there  were  only  one;  'lock  on  a  pago’  f^>r  «ill 
pages.  This  phenomenon  has  to  be  treated  in  a  special  v.'ay  when  developing  a  model. 


■’  A  pnli(  V  n.ikon  lhc>  rtrt.ikms  i<»50irtiric  r<»»ouKO  •IW.crIion  uccr  pjo^rami; 


MililiWiiMk* 


1 


'i 


1 
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In  ,nidilion  In  Iho  riljovc,  Ihc-re  are  a  number  of  locK;  winch  are  Uficci  very 
infi  c't)uenlly.  Mor.l  of  them  orcur  in  the  ovc^rlay  cfala  pafic  r  and  :.o  Ihe  harcfw.-n  o 
riir;nilor  is  not  able  lo  ptjjcir.ely  dclermiiio  which  locks  ari!  Lifinf.  used.  VVe*  f,rnuf>ed  all 
Ihfv.r  locks  und»?r  Iho  heading  'Mi'.ccll.'sncous'.  These  locks  alr.o  have  lo  be  modelled  in 
a  special  way. 

Thi-  proccs-sors  on  C.mmp  are  non-homo-jencous.  Some  are  I’DPll/'IO's  and  come 
are  I 'Dl-'l  i Also,  some  have  I/O  devices  aiul  some  do  nol.  I  lowc-;vc:r,  since  Ihe 
Hardware  monilor  ran  monitor  only  one  processor  at  a  lime,  wc:  were  constrained.)  to 
measure  only  one  procc-:ssor.  Tliere  is  a  soflv/are  tracer  ci'/ail.ilric  on  Hydra,  which  can 
be*  used  lo  monilor  all  processors  at  once.  11  in,  howc^ver,  iiol  suilable  for  studyins 
critical  soclic^ns  since  recording  an  event  wilh  Ihe  tracer  lakes  about  an  much  lime  an  a 
typical  critical  section.  Tito  porlurbalion  inlrodiicccl  by  tracing  in  unacceptable  for  lltir. 
rdudy.  In  our  cyperimentn,  we  had  tnore  processes  H'l.m  processors,  so  all  Iho 
prcjcr',sorn  were  hiir.y  tttoni  of  Ihe  time  We  expert  all  the  processor.';  to  exhibit 
behavior  like  the  measured  processor  an  far  as  crilical  sections  are  conccritcd.  Table 
5.1  presents  the  measuroments  for  the  three  programs. 
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T.ible  5.1  Moanurnmcnt  of  Ihe  Locking  (Ic’havior  in  llyclra 


J.  Av'cirngp  kornf;!  in<.l'!.itlions*® 
bc'fwojon  I  wo  sue  cec.si'.'c;  locks 

?.  of  locks  ifofetted 

3.  r req  usage  of  specific  locks; 
Processor  li.st  lock 
feasible  queue  1 
feasible  queue  8 
Lock  on  a  ptige 
Core  lock 
Slop  mail  bc«x 
KMPS  lock 
Miscellaneous 

A.  A'^erape  I  line  inside  a 

crilical  section  (rnitroscc): 
Piocesr.or  list  lock 
feasible  queue  1 
feasible  queue  8 
I  ock  on  a  page 
Core  lock 
Slop  mail  box 
KMPS  lock 
Miscellaneous 
A'^erage  for  nil  locks 

Pun  etc? penefont  data: 

5.  number  of  active  processors 

6.  Total  time  of  measurementfmillisec.) 

7.  Total  «  of  times  locked 

8.  Total  «  of  limes  blocked 

9.  1-  of  locks  that  blocked 

10.  "f  lime  spent  in  kernel 

11.  Average  time(microsec) 

between  locks 

12.  7  lime  spent  in  the  blocked  state 


Program  1  Pjogram  ?.  Synlhelic  General 
load  of  multiuser 
program  3  session 


413 

224 

5)5 

53 

79 

181 

0.1584 

0.3007 

0.1)51 

0.3420 

0.11 84 

0.2829 

0.1050 

0.05995 

0.033S 

0.0056 

0.0 

0.0028 

0.1723 

0.0 

0.3943 

0.234 

0.0457 

0.0 

0.0544 

0.0614 

0.0826 

0.0 

0.004 

0.022 

0.0927 

0.0 

0.005 

0.0 

0.296  J 

0.4108 

0.2523 

0.2312 

348 

409 

378.5 

191.5 

239 

259.5 

156 

168.5 

338 

— 

430.5 

557.5 

30, '.4 

(>84.5 

282 

264.4 

297 

108.6 

123 

134 

317.5 

461 

441 

279 

378 

279 

13 

14 

12 

17393 

32924 

20255 

2955 

5041 

4360 

130 

577 

146 

5.57 

11.77 

6.17 

61.87 

16.97 

37.77 

5888 

6531 

4646 

0.297 

0.S37 

0.747 

10 


Fot  lh«»  prfir»MM  iifirlrr  mrafuromml,  Uii*  nvwBtP  iiii-triirlipn  nMiit\ilinn  timr  wor.  2  B  mitror.ftf'n.-lr. 


71 


5.5.  Thi?  niociel 

I  at  lic;  r  of  tho  '.of  1  .vai  (.>  Inrl-.ont  problem  h.uJ  (ecu  •.t<i  otT  ii'.inr;  a  moclf.’l  v.'ith 

riitir.il  '.fctinn'.  occunn;:;  in  tanclnm.  Oor  mfrar-nt cnient-'.  on  Nydiii  inciicati.d  11ml  the 
loc  I'.io;',  b(>ha''ior  in  My<ii  a  nppro>;im.'Hlcr.  a  toivcfrl  wi'.li  ci  ilii.^l  i.cction!.  in  parnllcl  inoic.fd 
of  in  f.indom.  F(t;uff;  fij  display'.,  tho  Iranf.ilion  m.alriy  for  lixl:  r.c  ccc'.Pi.  ohi.ixvcd  lor 
one  pro;.;ram  on  Hydra.  11  lists  the  niimbor  of  timer,  n  loci;  v/as  called  v.'ilbin  ]  00 
niicrc5recc)nds  of  cxitinp,  Ihp  prcs'iotis  loci;.  If  can  bo  rcon  tliril  ''ery  few  of  Iho 
Iranrilionc,  nreur  in  land(?rn. 


Fiyiiro  5.1  TraiVoition  Matrix  for  l  oci;  Accerrer 


I.C'CK  nonto  Total  Ti'ane  i  t  i  on  to 

^  t i men 


usee; 

1 

2 

3 

7) 

5 

6 

7 

s 

3 

10 

1 . 

Ff’.as;  i  |)i  1  i  ty  fiiiuue 

1 

35B 

0 

0 

0 

0 

0 

0 

0 

0 

0 

191 

p 

Pfor-.c  s,oor  lint 

7.68 

1274 

0 

0 

0 

33 

0 

0 

0 

0 

0 

I'l , 

lock 

'  7H 

0 

0 

0 

0 

0 

0 

0 

GG 

0 

0 

^1. 

l-'.lGk  lock 

(1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5. 

Feoi.  ilji  1  i  ty  c)UGue 

8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

lb 

G. 

Stop  m.pi  1  box 

27. 7| 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

7. 

Core  lock 

135 

1 

0 

0 

0 

0 

0 

0 

0 

0 

8. 

1  ock  on  o  pnye 

503 

1 

0 

0 

0 

0 

p 

0 

0 

0 

0 

9. 

1 70  system  1 ock 

GG 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

10. 

11  i  !tr.:c- 1  I  oneous  I  ock 

fi  103 

0 

1 

1 

0 

0 

0 

0 

0 

0 

B 

Non  -ciilicnl  $f!ction  rxf.'Ciition 


M  -  1  lochs  (  critic  rll  ‘.rrtinn  P'.'rrtiliiin  ) 


iij  a  Av(*race  bctwf:»?n  iocivs  on  a  single  procvTSSoi" 

Uj  =•  Average  time  in  the  critical  section  {  2  <  i  <  Mi  } 

Mmisc  Numbers  of  miscellaneous  locks  being  mcdc!!c.’ 

p.  "  Probalily  of  attempting  the  i^^  lock  {  2  <  i  <  M  -  M  } 
I  misc 

t’misc  ^  ^  ^2  ^ . ^  -  ^^nisc  ^ 


Figure  5.2.  The  Central  5ervc;r  Model 
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A  '.iiviplr;  rf.nir.il  'crvor  model  (((?ir#’fr73]  )  '.Vi'i*.  ir.fd  If  niodel  the 

contr  nlion  ('.ce  fijuirn  IS.??).  In  our  model,  Ihc  I'l  <  ij'-tomerr.  iirn  llip  procc'  ',.orr.  in  C.mmp, 
1h(?  si.MV'it.e  'ilt!  )  is  llie  non-crilical  !:e<tion  exerulion  ,'m<l  ‘..ervite  silos  2  lo  M  an;  llie 
didoiinl  lorKs.  Tlie  mean  service  lime  of  server  )  is  the  mean  time  l.iclween  loclvs  on 
one  processor.  The  mean  serv'ice  time  of  all  Ihc  other  servers  in  llie  mean  critical 
section  lime  for  the  corresponding  lock.  The  probability  pj  -■  0  and  P2  lo  p^  are 
given  by  the  relative  frequencies  of  use  of  the  different  lochs.  Siiicc  we  have  N 
processors,  each  of  which  is  entering  a  critical  section  with  rate  I1|,  server  1  was 
macte  a  load  dcprndcMit  server  so  that  its  service  rate  is  Hji-nj  where  nj  is  the 
nii»nbnr  of  customer:;  at  site  J. 

I  he  lochs  tlas.sified  as  'loci;  on  a  page'  are  modelled  as  a  miilliscrvcr.  The  numl.or 
of  siiljservers  is  adjusted  till  the  preciicted  bloched  time  for  these  lochs  agrees  v/ilh 
the  rneasutnei  l)loch  time.  When  we  trieef  to  model  the  'Miscellaneous'  locks  as  a 
multi;.ervc:r,  llie  model  preciirteci  morn  blocked  titac  than  what  was  observed  for  all  Ihe 
fiiiscellanrous  locks,  even  v/ilh  tlie  number  of  suhrervern  equal  lo  Ihe  number  of 
procc scorn.  We  therefore  modeled  the  miscollancous  locks  as  locks  in  parallel 

such  lhal  each  was  attempted  with  equal  probability  and  each  has  the  same  moan 
service  time  equal  lo  the  mean  critical  section  time  for  miscellaneous  locks  as  shown  in 
figurn  5.2.  fftjuJted  until  Ihe  predicted  blocked  time  agrees  with  Ihe 

ob£;Rt  vc;cl  blocked  time. 

1li>?  model  is  used  to  calr.iflate  the  time  lost  due  lo  blocking  which  can  be 


1 


interpreted  as  the  processor  power  lost  due  to  blocking.  Con!;ider  the  blocking  at  the 
lock  (server)  for  a  motnent.  When  two  processors  are  present  at  the  server,  one  is 


actually  I  (JCttiviiij;  <.c!rv((  <?  and  ll>e  other  is  wailins.  Wlu.*n  thrre  proccc.orr.  are  at  the 
serve- r,  two  proccfscors  are  wailinv. 

Bu.'en  Rives  a  tornpnlationally  efficient  method  for  caltulaf it'p.  the  probability  of  K 
customers  being  present  at  the  server  Pfnj^  -  K>  in  the  load  dependent  server 


case. 


P(nii«-k)  =•= 


*  r.(N-l:,  M-] ) 


M'  "''  ■■  Ai^^tk)  4  (i(IM) 

Wl^ere,  the  X,  A,  (3  and  g  are  tl^e  same  as  those  efefined  in  [(SU'i!C73]. 

By  permuting  the  numbers  assigned  to  the  locks,  one  can  gel  the  probability  of  n 

customers  benng  present  at  the  server  (or  0  <  n  <  N  and  2  <  i-  <  M. 


Fraction  of  the  time  lost  due  to  blocking  at  the  server  is 

« 

lostj  -  P<nj.-2)  +  P(nj-3)*2  +  P(nj=-/1)f3  +  . . .  +  P{nj-N)-.KN"1).  (  2  <  /  <  M> 

It  llien  follows  that, 

Total  fraction  of  time  lost 

due  to  blocking  ■»  ^  lost: 

In  order  to  validate  the  model,  we  have  to  calculate  the  blocked  time  ns  seen  by  one 
processor  since  the  hardware  monitor  measures  only  one  processor.  Since  all 
processors  are  assumed  to  be  identical,  this  is  just  the  total  blocked  time  divided  by 


the  number  of  processors  N. 
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Fi(;uri.*  5.3  V;ilid;ilion  of  the  Central  f'crvcr  Wodcl 


i 

I 


Program  1 

Program  2 

Pi  ogr  am  3 

Mo  nr.  1  ir  e  me  nt 

Number  of  active 

priK£;r.«,or>i  (N) 

13 

14 

12 

Pen  ent  time  loj.t 

on  one  proceccor  at 

eperifir  Irdrs 

measured  [predicted] 

pioccc.c.or  lint  lock 

0.1249  [0.105] 

0.4991  [0.5119] 

0.1439  [0.08S7] 

feac.iblf?  queue  1 

0.0143  [0.0167] 

0.2577  [0.1 385] 

0.08J  4  [0.0335]  I  j 

1  foanibln  queue  8 

0.0163  [0.0087] 

0.0051  [0.0023] 

ton;  lock 

0.0152  [0.0212] 

0.0698  [0.0637]  i 

stop  mail  box 

0.0066  [0.0177] 

i  ■ 

Lock  on  a  pajie 

0.0697  [0.1182] 

1 

KK'tPS  lock 

0.0023  [0.0032] 

j 

Miscellaneous 

0.0490  [0.0352] 

0.0687  [0.1253] 

0.4393  [0.4667]  [ 

Total  percent  lime 

lost  for  one  processor 

0.298  [0.325] 

0.S31  10.776] 

0.735  (O.Ci53]  [: 

i: 
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Figuri;  S.'i  Ptodirtions  of  II ir  Model 


(^)  IiK,»  f:<if;ing  lumibor 
of  prococGorc: 


Totr)l  peMcenf  time 
lo*.;l  on  one  protetsor  for  » 
3J'  processor  system 
dO  processor  system 
48  processor  systenr 


0.9593  2.5c%3 

1.3011  '^.9816 

1.7033  6.1505 


(b)  U'iinj;  only  one  lock 
for  fill  miscellaneous 
critical  sections; 


Total  percent  time 
tost  on  one  processor  for  a 
N”  processor  system 
3r’  processor  system 
40  processor  system 
4.':.  processor  system 


1.18 

1.99 

5.42 

8.73 

9.70 

15.42 

17.10 

25.00 

(c)  JTIiminaling  the 
processor  list  lock; 

Total  percent  time 

lost  on  one  processor  for  a 


M* 

processor 

system 

0.31 

0.56 

32 

processor 

system 

0.87 

1.66 

40 

proce. ssor 

system 

1.J4 

2.36 

48 

processor 

system 

1.43 

3.29 

*  N  ”  13,  14  and  12  respectively  for  the  three  programs. 
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Figure  5.3  cfir.plays  the  measured  ancf  predicted  percentage  of  block  time.  The 
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mcnt  between  the  inr.itr.nred  and  prodittod  valix*?:  i*;  fairly  I’ood  for  (>rc);.’rai«  1 
b'.it  (or  programr.  ?.  and  3,  the  actual  contention  (or  the  (c’ac.ible  tmeue  1  and  thet 
processor  list  is  more  than  what  is  predicted.  We  do  not  yet  know  why  the  actual 
contention  ir.  so  lar};r'.  Pre;';r'’>n''  2  at^d  3  involve  synchronisation  amone  many 
cooperalinR  processes  whicli  become  feasible  almost  simullaneously.  This  results  in 
some  niriount  of  temporal  correlation  among  processors  when  they  attempt  to  access 
the  feasible  queues  and  the  processor  list. 

The  model  can  be  easily  eyiendod  for  different  numbers  of  processors  (see  figure 
5.d  (a)).  We  assume  Ihal  the  number  and  chat acterir.lics  of  the  locks  do  not  change 
when  the  nu»nber  of  processors  is  altcrccf.  This  might  not  he  entirely  valid  since  the 
inisccIloneouK  locks  will  undoubledly  he  more  diverse  wilh  more  procc^ssors  and  hence 
Hnific  have  to  be  increased  for  better  predictions.  We  decided  not  to  alter 
liowcvcjr,  since  using  lire  same  will  give  us  a  worse  case  bound  on  tlie  blocked 

time.  It  can  be  ‘.een  that  Hydra  has  done  a  very  good  job  of  parlilioning  the  shared 
objects  into  different  critical  sections.  The  processing  power  lost  is  very  small  even 
for  tlic  4S  processor  case.  Figure  5.5  displays  the  effects  of  the  blocking  for  program 
3. 

f 

We  can  also  investig-ate  the  effects  of  reciucing  the  number  of  locks  in  Hydra.  If  all 
the  'miscellaneous’  critical  sections  in  Hydra  were  to  be  executed  using  just  one  lock, 
considerable  saving  in  storage  will  result  since  each  object  will  not  have  to  contain 
space  for  a  lock.  Our  model  predict.^  (see  figure  5.4(b))  lhaf  the  performance  penally 
of  fhir.  change  will  be  small  for  a  16  processor  *.ysfc-=m.  For  larger  system.'.,  it  will  still 
be  advisablo  to  have  '.-eparate  lochs  in  each  object.  In  the  curroni  implrmnntation  of 
Hydra,  it  is  pcmsibln  to  achieve  the  necessary  mutual  exclusion  without  the  use  of 


Kf  ft'c  tivo 


program  3 


Pr()cei.o(>rB 


Number  of  Processors 


Figure  5.5  Effects  of  Bloc  King  for  Program  3 
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'prnc<»r.r.or  lint’  lock.  In  (iyure  5.^1  (c)  we  dir.play  our  pri’dictionr;  if  thit;  lock  in 
climiiKnled.  The  litne  lost  due  to  blocking  will  reduce  as  would  be  expected. 

5.G.  Conclusions 

Wc  have  presented  our  mcnr.ureriicints  on  the  mulliproccjssor  ronlenlion  ii>  accessiitg 
shared  data.  The  mrar.ijr(?riif;nts  indirale  lhal  less  than  1  pcuccnt  lime  is  lost  due  to 
blocking  in  Hydra.  This  ncjgligible  amount  of  degradation  is  a  result  of  parlilionit^g  Ihe 
£ihaj-ed  data  objects  into  small  segments  thereby  reducing  llte  ctilical  section  times.  It 
should  be  noted  that  Hydra  uses  the  locks  for  nynchroni;jalion  al  on'/  one  of  several 
levels.  At  higher  levels  in  the  system,  semaphores  and  other  message  operations  are 
used  for  synchronization  which  do  result  in  context  switch  c>vc;iliead  but  do  nol  cause 
any  loss  of  time  due  to  blocking.  From  a  purely  performance  point  of  view,  Hydra 
could  have  used  fewer  locks  with  longer  critical  sections  and  il  would  still  have  had 
acceptable  performance.  This  in  an  important  result  for  the  designers  of  fuliiro 
multiprocc^ssor  operating  systems  an  well  as  for  those  frying  to  adapt  a  uniprocessor 
operating  system  to  a  mulliprocessor. 

Wr  have  also  presented  a  central  server  mode)  which  predicts  the  observe; d 
blocking  behavior  reasonably  well.  The  model  is  used  to  cxiiapolale  the  blocking 
behavior  in  systems  with  up  to  processors.  tTven  at  flP.  processors,  the  degradation 
due  to  lOch,  contention  appears  to  her  small.  Other  interfc'inncc  problems  at  higher 
levels  in  the  software  [FULL76b]  will  be  Ihe  limiting  factors. 

In  any  real  multiprocessor  operating  system,  the  actual  locking  behavior  will  usually 
deviate  from  the  simple  central  server  model  presented  here.  For  example,  some  locks 
might  always  be  executed  in  certain  sequence  or  some  locks  may  be  nested  inside 
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olhrr  locks.  These  r.iliintions  iniRht  have  to  be  tnodclicci  as  a  network  o(  queues.  Our 
r.liidy  docs  not  point  out  any  major  deviations  from  a  central  server  model  tor  locking 
behavior  of  Hydra,  bill  the  assumptions  undcriyins  our  model  will  have;  to  be  verified 
for  other  op{>rating  systems. 
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6.  other  Experirnenls  Performed  Using  K.mon 

In  thciplirr  ?  we  pr«?r.cntod  an  (.'nt/tneralion  of  fho  intercrjlin"  pai  aineleri;  lhal  nacd 
to  bo  '.tudiod  in  a  gejnoral  purpose  computer  system  (  see  fii;iirt!  ?.l).  Many  of  these 
parameterr.  are  relevant  for  our  evaluation  of  C.mmp  and  Hydra.  However,  it  is  not 
possible  to  investigate  in  depth  all  Ihe  performance  paramr; lers  of  a  complo: 
multiprocessor  system  like  C.mmp  in  a  short  time.  We  performed  many  experiments  in 
Our  study  of  liresc  pnr;ririf:lf.'r:i  but  v/e  dir.coveri'd  that  we  did  nol  have  all  Iho  special 
tools  needed  for  an  effective  evaluation  of  a  multiprocessor  system.  It^  this  chapter, 
we  prosc:nt  many  specific  experiments  that  were  performed  usin^,  Ihe  tools  we  had.  It 
is  easy  to  get  bogged  down  in  the  details  of  experimental  setup  in  such  a  case  study. 
Wf>  have  tried  to  avoid  thin  problem  by  giving  a  briiif  c.'escriplion  of  Ihe  setup  for  c+ach 
experiment  ancf  describing  the  goal  of  the  experriment  and  the  inter  pretation  of  the 
result  in  more  depth.  The  experiments  are  discussed  along  the  system  levels  presented 
ill  chapter  2.  The  system  levels  are  : 

1.  Hardware  architecture 

2.  Operating  Sys.te;m  design 

3.  Systems  programming 

4.  Applications  programming 

5.  Installation  management 

6.1.  Meanuraiivients  at  the  Hardware  Architecture  Level 

K.mon  is  ideally  suited  for  measurements  at  Iho  hardware  arr.hilecturo  level  by 
virtue  of  the  fart  that  it  is  capable  of  monitoring  every  cycle  on  the  Unibus. 
Moreover,  its  sophisticated  event  detection  mechanism  can  be  used  to  select  only  the 
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inlerc-itinR  c/cIi.m;,  Ihori.'by  removi(>g  the  need  for  tecording  every  IJMibun  cy<le  for 
posl-proresGing.  K<ogI  of  ovir  (‘xpcMin^entG  at  this  les'el  are  rJiK’ctcri  at  flic  evaluation  of 
C.nimp.  Tlie  meaGurements  discuGsed  here  are: 

1.  study  of  memory  interfeirente  m  C.mmp 

2.  quantification  of  the  effects  of  fhe  small  address  spare  on  Hycirn 

3.  measurements  of  the  types  of  memory  accesses 
a  complete  cycle  by  cycle  trace 

6.1.1  Memory  Interference  in  C.mmp 

C.himp  consists  of  up  to  16  processors  connected  to  as  mat^y  as  16  (ncfncry  ports 
via  a  cross-point  switch.  This  arrangement  leads  to  contention  in  the  switch  when  two 
or  more  processors  attempt  to  access  one  memory  port  at  the  same  time.  This 
problem  has  been  studied  earlier  by  i3handarkar  {DMAf>i;'3],  McCredie  [K<CCR73]  and  by 
Baskedt  and  Smith  [BA5K76]  using  analytical  models.  Our  approach  hero  is  to  actually 
mear.iire  the  effects  of  contention  in  C.mmp  when  it  is  executing  different  worl.londs. 

Ono  large  manufacturer  estimates  that  for  each  addilional  processor  iii  its 
mijIliprocesGor  systc^m,  10  percent  of  the  additional  processing  powc:r  is  lost  due  to 
memcry  contention.  The  loss  of  processing  power  depends  on  many  factors:  the  access 
and  cycle  times  of  the  memory,  the  time  taken  by  a  proerssesr  to  issue  anedher  main 
mrfmory  request  after  one  is  satisfieef,  the  disfribulion  of  memory  accesses  to  the 
different  ports  and  the  amount  of  1/0  traffic  For  the  overall  syslem,  these  factors 

Fo»  out  ptudy  ■?(  wo  coul:!  ittiom  fho  I/O  Ir»Hic  eintt  i)  is  not  S!{nitiri>nt.  Howovur.  if  tho 

piocctrois  »r»  oqui|i|ii>d  v/ifh  <tichr  mmiortop,  Ihr  a»<«<T-eiiOf  to  m«-mpry  lii^ffir  is  rpcluti:l  nml  tlif  1/0  tr«ffie 
fhro  bf'Cor<?*  sitnifirsnt  Tho  I/O  tuflic  h»s  a  potuUir  th»*«rtonstir  of  nriorsiin.  oonr.riufivo  woido  amt  ifa 
•ffoci  rwedo  to  bo  eono*:iyrail  oMptuilly  tiaporiilly  for  oon-iolortojivitj  ntiKioript 
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cnnnr»l  bo  inonr.i.t? o<;l  ui.iiij;  K.tnon  '.itK.e  it  can  monitor  only  one*  proctr'^.Gor  at  a  time. 
Moroo'/or,  5.in<e  tho  mrmory  siiljsyGtem  in  C.mrnf)  oporaleo  at  a  far.ter  rate  than  the? 
LJnibiis,  wo  ccniUi  not  «ir-e  K.mon  (  <lesit>ne<t  for  a  Unibi/s)  to  monitor  Itie  memory 
Kub5.y'.;tcMn  ctiicclly.  Vife  could  only  monitor  »;ecc>ndary  paramr-lori;  lihe;  the  length  of  a 
memory  cycle  a;;  jieen  by  a  processor. 

Th>^  avc-jrage  length  of  a  cycle  increases  with  the  contention  in  the  c;v/ilch. 
Unfor lunately,  wo  could  tiol  mcar.ure  the  length  of  a  cycle  directly  since  K.mon  is  not 
equipped  wilh  a  high  r.pt;ed  clock  necessary  for  such  a  mnnr.uiement.  Instead,  K.mon  is 
provided  with  six  onc-  shol  flip  -flops  v/hich  change  their  state  at  pi  nspecifircl  limes  ( 
0.5,  1 ,  2,  d,  Id  and  50  microseconds  respectively)  after  a  memory  c  ycle  is  initiated.  By 
examining  the  value  of  these  flip-flops  at  the  end  of  a  memory  cycle,  we  can  determine 
the  lime  bracket  (  or  a  bin)  into  which  the  length  of  that  memory  cycle  falls.  This  in 
effect  generates  a  crude  histogram  of  memory  cycle  lengths  and  it  givc;s  an  indication 
of  the  contention.  Evc^n  though  incfividual  memory  cycle  lengths  are  not  avail.sblc,  we 
can  use  the  mean  time  value  of  a  bin  to  approximate  the  average  cycle  length  of  all 
the  cycles  falling  in  that  bin  to  yield  the  grand  average  of  the  time  taken  to  complete  a 
memory  cycle. 

Our  experimental  study  attempts  to  quantify  the  extent  of  memory  contention.  VJc 
calculated  the  average  cycle  length  lor  three  different  workloads: 

1.  Idle  machine:  This  measurement  was  made  as  a  basis  for  comparison  wilh  tho 
olher  workloacis. 

?.  XSEARCH:  This  is  a  root  finding  program  which  creates  16  coopc?rating  processes 
c*>:c.>culing  in  parallel.  All  these  processes  exccule  the  same  code  but  they  all 
have  individual  copies  of  fl>e  code  pages.  Their  activity  is  therefore  distribulecf 
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tlii  f)U(ihout  thp  memory  ptirls.  This  worhloaii  ir.  ax|jcctr;d  lo  pi  oduce 
.'ipproximrifely  Iho  e;imf;  conlenlion  as  many  inclopc’i’di.'nt  ii'.nrt.  execiiling 
diflerant  programs  rutulting  in  lieavy  use  of  luosl  of  the  processors.  The 
nvc>rap,e  cycle  time  is  naliirally  larger  than  the  pmsdcjiis  worKlond  (idli?  machine). 

3.  rirARCII;  This  is  llie  same  program  as  XSLAICII,  c'xccpl  all  the  16  proc cesser, 
share  the  code  page,  that  is,  they  all  make  inslruction  fetches  from  the  same 
meriiory  port.  A  large  amount  of  memory  sv.'ilch  contention  is  to  be  expected 
with  this  workload. 

Vv'e  s.impled  100,000  cycles  at  random  for  each  of  these  three  workloads.  To  simplify 
the  t xpc;rimf;nt,  K.mon  v.'as  set  up  lo  measure  the  Icngllt  of  every  memory  cycle 
generated  by  the  P.hosl.  However,  since  K.mon's  output  rate  is  less  than  the  main 
memory  access  rale  of  a  PDP-11,  the  internal  buffers  of  K.mon  overflowed  after 
collecting  the  cycle  length  for  about  160  cycles.  This  gave  rise  to  windows  of 
moar.i  ircunent  occuring  after  variable  times  thus  effectively  randomi.ting  Hie 
meani.irements.  The  cycle  length  histograms  produced  by  K.mon  are  given  below: 
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MICROCOPY  resolution  TEST  CHART 


rijiiiii;  6.1  l.cnp.ih  of  a  MiMoory  ('ydr 


Cyril.'  length 

Vdorkload  1 

Workload  2 

Workload  3 

rrttci 

, 

0.0  -  0.5 

0 

0 

0 

0.5  -  1.0 

?.S439 

85134 

69453 

1  -  2 

1)404 

13876 

nool 

2  -  5 

71 

958 

3344 

5  -  14 

79 

3) 

15421 

1 4  -  50 

7 

1 

ISl 

above  !i0 

0 

0 

0 

A'^.-rnge  length 

0.S466 

0.SS34 

2.335 

microseconds 

It  is  interesting  to  note  flte  significant  tail  for  column  1  (  the  i(.tl(?  machine).  Such  a 
tail  (.an  arir.e  only  due  to  rnomnry  contention.  Jn  our  data  Ihi.c  effect  ir.  more 
pronounced  Ix^cause  in  C.inmp,  memory  conflicts  aro  arbiirnteci  according  to  strict 
liar(fv,'are  priority  of  the  processors  and  our  measurements  were  made  on  a  processor 
havifrg  (he  lowest  priority. 

It  can  be  seen  that  the  contention  for  the  second  workload  is  quite  small.  This 
worivioad  is  e/pocted  to  cause  the  processors  to  distribute  their  memory  accesses 
uniformly  across  all  Ihc  memory  ports.  Simple  queueing  models  uf.ir>g  a  procc-.sor 
service  lime  of  1.2  microseconds  and  a  memory  service  lime  of  O.SdGG  microscconcfs  ( 
using  the  average  from  column  1  ),  give  the  expected  average  waiting  lime  when  all 
the  16  processors  are  active*  to  be  al>C)ut  1.2  microcrconcis.  One  explanation  of  flic 
small  increase  in  the  a'/cmage  wailing  lime  observed  in  our  study  is  that  the  length  of  a 


cycle  is  composed  of  a  fixed  arbitration  time,  a  fixed  cahli*  delay  and  a  variable  waiting 
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tiiiu-:  for  iictiiHl  iiir.tnc’ry  '.(jrvicc.  Wlicn  t'.vo  or  mot u  proc icpocst  '.or'/itc  from  a 
porf,  Ihoir  ail>ilralion  times  arc  Overlapped  and  tints  do  t^ol  contribute  lo  any 
lcrrp,litening  of  their  cycles.  Another  explanation  is  tlial  in  worl'.lo.id  2,  each  of  the  16 
pi  c»cc  ssc'rs  is  provicieci  with  iinii; pendent  instruction  and  dain  pap.os  and  it  is  )iossil:)li? 
that  the  whole  system  divides  itself  info  16  loosely  coiipleci  parlihons  IcaeJing  lo  a  low 
intcriorence.  K.mon  cannot  be  used  to  verify  these  explanations  since  it  is  not  capable 
of  mcar.ijring  memory  arbilration  time  or  the  meiviory  access  l>ehavior  of  all  tlie 
processors  at  once.  It  is  howc;vc:i,  an  interesting  problem  and  we  hope  it  will  be 
sliidii'd  in  lafer  investigations. 

6.1 .2  Small  Address  Space  Problem  on  PDP-1 1 

Sinre  the  PDP-1  J  has  a  small  address  spare  (  6^  K  bytes),  C.mmp  uses  a  sel  of 
relocation  registern  to  access  its  large  physical  memory.  The  virtual  address  spare  of 
a  PIDI''- 1  1  is  divided  into  8  pages  of  8  K  bytes  earb  for  this  purpose.  There  are  8 
relocation  registers  (  henceforth  called  PR's)  cornssponding  to  tliesct  S  pages.  The  RR's 
are  used  lo  translate  a  virtual  address  into  a  physical  address  [V»lJLr72].  C.mmp 
utili^oc  4  Operating  modes  (  kernel,  user,  !/0  and  unused)  with  S  PR’s  each. 

C.mmp  thus  provides  an  environment  for  writing  large  programs  and  givers  us  the 
unique  opporlunily  of  observing  how  Ihe  small  addrc.ss  spare  of  a  PDP-11  affects  the 
execution  of  large  programs.  In  any  large  program,  some  lime  has  lo  be  devoted  lo 
maintaining  the  RR's  in  ordc?r  to  make  different  pager,  of  the  program  accessible.  The 
time  spent  doing  this  is  clearly  Ihe  price  one  has  to  pay  for  writing  large  programs  to 
run  nn  a  small  address  spare  machine.  Our  experiment  is  designed  lo  study  and 
quantify  this  cost. 

We  used  the  Hydra  system  itself  as  our  test  program.  Evc'n  though  the  cost  of  using 
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a  riiii'  hino  with  n  inidri'rr.  ;.|><ir.r  fJ(!pt*ndr  on  Iho  in.mitor  iii  wliich  Iho 

l. 11  p.?  physical  intiviory  i;;  ^  we  helicve  lli.nt  llyclr.i  r>:cc  nliiif,  a  rn;i',.on.Tble 

worl-.loiid  is  a  good  cv.TiiipiR  of  n  largo  program  ur.ing  a  addrtjns  !>pace.  Hydra 

fOnjdsls  of  aboi.il  f)0  pager,  of  in;.trii<  tions  and  data,  it  user,  flir  Kernel  to  acccr.c 
tlir'(=  pages  in  the  following  manner: 

Figure  6.2  Kernel  mode  Relocation  Registerr. 

RR  Function 

0  fixect.  starK  I'Jiigo 

1  fixed,  common  data  page 

2  overlay,  data  page 

3  overlay,  data  page 

4  ovc;rl.ay.  instruction  page 

5  fixed,  common  instruction  page 

6  fixed,  local  memory 

7  fixed.  I/O  page 

fxlotc  that  only  RR4  is  used  to  access  the  overlayed  instruction  pages  and  RR2  and 
f^R3  are  used  for  c)vc>riayc:cl  data  pages. 

Tht?  experiment  is  concfucted  in  three  parts,  each  giving  cuccessivc^ly  def.ailed 
pictuies.  In  tlio  first  part,  all  accesses^^  (  read  or  write)  to  the  Kernel  l?R's  were 
counted  over  the  duration  of  a  second  along  with  all  cycles  lal.ing  place  on  the 

m. ichine,  all  kernel  cycles  and  all  kernel  instructions.  This  givers  a  general  estimate  of 
the  cost  since  any  access  to  a  RR  is  a  direct  result  of  an  HllcmpI  to  gain  accessibility 
to  a  new  page.  In  part  2,  wc  trace  individual  accesses  to  RR2,  RR3  and  RR4  instead  of 


For  nxiirpU..  Ihc  cotf  ip  lass  tf  Ihp  mrmory  is  vppcI  1o  slots  Ipifo  votloi!  nml  .iM  wsids  io  a  pnfa  «io 
Bcannr;l  hnlorp  swilchitij.  Io  nnolhpr  pi  ja 


^^11  Ip  impoiliiol  nol  Io  eonf.jse  on  ncass  Io  s  RR  from  Ihp  ptoyrsm  lot  Ihr  puipata  of  iorporlioi  or 
mo'.lilytnc  il  willt  an  access  from  flip  hnrdwnro  mjchm.'  for  Ihr  pii'por.r  of  Itanr'jlifil  n  vitliinl  aclcliops  Mpro 
wo  Bto  contcrnpcl  with  Ihp  formrr. 
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jii5.l  '■.oiintinf;  them.  This  i?>  i.i!.t;cl  to  5.oe  i(  atcc'.'.ci-  to  lltc  Rl? ni'f?  umtornily 
(tir;f t  il:)utccl  io  liinr;  Ihi  oinihoul  the  Kernel  execution  or  l>unchctl  tO{;t  tlic;r.  Moreover, 
■iiice  the  c)<it<i  l.itiing  re^Ki/wrillen  is  also  traced,  it  ir.  possible  to  identify  redundant 
writes  ,  lliai  is,  v/lien  the  sanic  value  it.  written  l)a(k  into  a  RR.  I  inally,  in  part  3,  we 
Irate-  each  cycle  tal.inR  pl.ace  on  R.host  to  determine  hov/  many  insiruc lions  and  cycles 
arc?  actuall'/  at.sociated  with  an  access  to  a  RR  and  to  study  in  [>enc?ral  the  reasons 


behind  changinj;  the  PR's. 
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iinr ntfil  ('on(i;',i.ii  iition: 

Pi  oc  C';r.or;C.inmp  procrr.r.or  3.  (PDP-1 1 /'lO). 

Workload:  I  lytirn  oxociitins  X5LAPCH  (  see  seclion  6.1.1) 


ffAPT  1 


Pigiire  6.3  The  Rale  of  RR  Accesses. 


I  hr  ( oiint.s  for  the  following  four  f)ijanliiies  an?  provided  for 
sixleen  one  second  intervals: 


All  Kernel 

Kernel 

Accesses 

liv.lriic  lions 

cy<  Ic  scycles 

instructions 

to  PR's 

per  .sccess 

3‘^.30O0 

1076/15 

-15226 

29)3 

1 5.52 

363s3:-.’!s 

J  773/11 

73'133 

5)30 

14.31 

3^00.:>fi 

78579 

3356S 

)813 

18.21 

35090'! 

132763 

55957 

3-138 

16.27 

30SC78 

1 150-19 

-1S256 

3099 

15.57 

333357 

78218 

33258 

1837 

18.10 

320382 

775/14 

33161 

1798 

18.44 

3 'll!  202 

86569 

36759 

2072 

17.74 

3  J  2000 

92972 

39239 

2359 

16.63 

^01517 

16/185) 

68575 

4729 

1  -1.50 

3313/177 

117706 

-19366 

3)96 

15.4-1 

358237 

82S76 

35238 

19S0 

1  7.79 

371/135 

160-113 

66763 

4595 

14.52 

3-17500 

77-100 

32973 

1824 

18.07 

372426 

167739 

69702 

4768 

14.61 

400289 

165853 

68953 

4653 

14.81 

The  average  nurabor  of  insIrucTions  between  RR  accesses:  1 6.?S 


II  can  be  seen  flial  a  RR  is  accessed  on  an  avc'cage  of  every  16.3S  Kc'inc;! 
insiriictions.  Assntning  one  instruction  per  access,  this  corresponds  to  an  overhead  of 


about  f).‘5  percent.  It  •hould  l)e  noloci  that  thi;;  O'^cMhencI  conr.iderr.  only  Itio  execution 
penaily.  ttowover,  ttte  ematt  addrejss  space  (orces  llydr.i  to  use  3?.  bit  addressc’s 
iidcrnatty  (  Kiviiiy,  the  t?R  value  and  the  dinplaccrnenl  williin  llic  pap,o)  to  access  m.iny 
roiilit>rs  and  data  objects.  We  expect  the  storage  spare  overhc'ad  to  be  signific.int 
even  though  it  did  not  torn\  a. part  of  our  ;.ludy.  The  next  cpiestion  is  to  determine  if 
all  flic  llirpo  vari.iljle  RR's  get  arcessed  with  the  samr;  (i ecjiicnty.  (Iiis  is  ctonc  in  pail 


PAIil  ? 

Figure  6.4  The  Accesses  to  Inclividual  PR's. 
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Cycle 

type 

Relocation 

re^i'.ter 

Kernel  inr. true  lion;. 

'..irwe  lac.t  acrci.r. 

Wrile 

RR3 

WriU* 

RR3 

1 1 

Write 

RR4 

69 

Wrilu 

RP2 

2 

*  Wiilf* 

RR2 

13 

Write 

RR4 

56 

RR3 

31 

Wrilt: 

RR3 

2 

Write 

RP3 

9 

Write 

RR4 

25 

*  Write 

RR4 

36 

Vk'rite 

RR4 

37 

RR4 

34 

Read 

RR4 

24 

Write 

RR4 

1  i  exclmnse 

f?ea(t 

RR2 

23 

Write 

RR2 

]  :  pvchnnf.e 

l?ea(l 

RR4 

9 

Write 

RR4 

1  ;  exchange 

Write 

RR3 

33 

t?ead-pai)<^o 

RR3 

6 

Write 

RR3 

0 

Write 

RR4 

20 

Write 

RR2 

5 

♦  Write 

RR2 

35 

Read 

RR2 

7 

Read 

RR4 

26 

Write 

RR2 

6 

*  Write 

RR2 

34 

Write 

RR3 

12 

Read 

RR3 

4 

Write 

RR2 

0 

Read 

RR2 

8 

t?ea{t 

RR4 

24 

Read 

RR4 

24 

Write 

RR^ 

1  i exchange 

Write 

RR3 

44 

f?ead 

RR3 

97 

Write 

RR4 

13 

Rea<t 

RP3 

3 

t?ead 

RR2 

3 

Read 

RR3 

1 

t?ead 

RR4 

3 

Read 

RR4 

1 
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t  ntclirater;  n  ircluntlnni  write 

Acf.r;r:.c''..  to  ('('2:  12 

Atf.r- to  J4 

Af  ct'-.c;.  to  RRA:  17 

Av'craoe  kcrt»t'l  iot.trticlionr.  tjetwi'en  RR  atccr-s:  18.01) 

It  can  1)0  seen  llial  the  mr;ar.i.it(>ments  on  Itiir.  levet  yield  an  a'/c;rap,e  of  al)out  18 
Kerne  I  instructions  per  RR  arccss.  Comparinj!  it  with  16.23  obtaineicf  in  part  1,  one  can 
infer  Ihnl  the  nccesscs  to  the  PR’s  arc  dir.tril:>iit(>d  uniformly  in  lime  thi oiiphoul  the 
Kerned  o.eeulion.  Quito  a  few  of  the  'svrile;*  arcer.r.cs  are  seen  to  l)c.‘  redundant  .  These 
arise  because  it  is  easier  to  v/rite  tine  required  value  in  a  RR  than  to  cineck  if  the  t?R 
alread  •  has  the  same  value.  It  is  interesting  to  observe  tlial  the  aecesses  are  almost 
cj'-enly  spread  across  the  three  PR's.  If  one  of  these  Unrec  '.vas  found  to  lip  lightly 
used,  it  might  have  been  advisable  to  use  it  permanently  to  access  line  most  heavily 
used  overlayed  page. 

It  is  interesting  to  consider  if  the  overhead  of  relocation  register  maintenance  could 
have  Ineen  reduced  if  an  exchange  instruction  were  available  for  the  PDR-11.  The 
purpose  of  tinir,  instruction  would  be  to  store  the  current  value  of  a  RR  on  line  stack 
and  to  replace  it  v/ith  another  value.  In  figure  6.<1  we  have  inarkcd  all  the  RR  accesses 
that  could  have  been  saved  if  such  an  exchange  instruction  were  available.  It  cam  be 
seen  that  4  out  of  (he  total  43  accesses  could  have  been  saved.  Unis  would  Inave 
rGSulled  in  increasing  the  average  number  of  kerne!  instructions  between  successive 
RR  accesses  to  19.9  instructions. 

^RI  3 

We  traced  two  cliffc>rent  processors  for  this  part  to  study  tine  low  le\'cl  operations 


i 
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li'iidinj;  lo  a  I’f?  accc;.::.  Pfoccr.r.or  3  on  C.ininp  no  I/O  cinvicc!-  ;in<l  llio  Kornol 
execution  on  it  i;;  liiriilf.-cl  to  o/ociiting  Itie  Kcirnel  calln,  l^t ot c;r.c.or  0  on  the  other  hanct, 
has  many  I/O  clevici-s  and  i|  oxeculcr.  many  inlcrnipl  routines.  The  two  ptoccssors 
indeed  exhibit  dHlerent  traces.  K.incn  can  trace  only  a  small  number  of  cycles  bc'tore 
overflowint;.  This  Ei'/c>s  rise  to  a  rather  small  v/iitdow  in  the  Kernel  e>:ccwlion  to  make 
any  pcncral  commr^nfs  l  egarrtinc;  v/hy  and  how  the  RR's  ar  e  accessed.Figi.it  e  6.5(a) 
displays  a  trace  for  processor  0  to  illustrate  the  instruction  sequences  leading  lo  a  RR 
access,  rigiire  6.5(b)  displays  a  pari  of  a  similar  trace  for  procei.sor  3. 

f lie  traces  revc;al  two  different  Kinds  of  ov'orheads  associated  with  maintainiiig  Hie 
RR's.  In  figiiie  6.5(a),  10  instructions  (6  through  Id  and  ?A  throiigli  ?.?.)  are  rc^quired  to 
call  a  subroutine  in  an  ovcnlay  page.  Figure  6.5(b)  shov/s  that  3  insiructions  are 
required  to  gain  accessibility  lo  an  overhiy  data  page  and  one  inslruclion  is  required 
to  swvilcli  back  to  Ibo  previous  data  page.  In  parts  1  and  ?.  we  only  looked  at  the  cost 
associated  with  RR  accesses,  thal  is,  we  cfici  not  explicitly  connieJer  the  additional  cost 
associated  with  calling  a  subroutine  residing  in  an  overlay  page.  The  overall  cost  will 
depend  on  how  many  times  such  subroutines  are  railed.  We  do  not  have  sufficient 
data  regarding  that  to  draw  any  general  tonclusicns.  However,  in  appendix  D  we 
provide  an  execution  trace  of  Hydra  which  shows  that  3.5  limes  more  instructions  are 
oxpculod  in  the  ovc-uday  pages  compared  to  the  common  code  page.  In  the  specific 
case  of  llio  program  being  monitored  in  part  2  abovc^,  we  observe  that  17  accesses 
were  made?  to  RR4  during  774  Kernel  instructions.  Since  tv.'0  arcesser.  lo  RR4  are 
requirr’d  lo  access  an  ovc-!rlay  instruction  page  and  rcHuin  lo  the  previous  pag^i  ’I"*^ 
cor.t  of  the  17  RR4  accesses  is  about  85  insiructions.  Similarly,  the  cost  of  the 
accesses  lo  RR2  and  RR3  is  24  and  23  instructions  respc-clively.  The  total  cost  of 


<iioin5  RR';-  in  thun  137  inctruclions  oui  of  770  (  <il>c>iil  IS  percent).  It  hould  bo 
noted  thnt  tliin  in  ;it  bcr.t  a  rou^h  ontinintc  of  tho  real  tort  invol'.-cd  in  rnaintaininc'  tlic 
I^R’s.  The  nludy  of  costs  and  benefits  of  iisinc  a  10  bit  maihino  to  v.'iite  larp,e 
progtamr.  in  an  interesting  topic  for  future  research. 


Fi(;ur(;  G.fMa)  inr.trucfion  Tr.ice  for  Procc  ocor  0. 


(  1) 

MOV  R0,-(5P) 

save  value  of  RO 

( ?.) 

MOV  Rb,-  (SP) 

save  value  of  Rb 

(  3) 

MOV  R1,-(SP> 

put  value  of  R1  on  llie  slack  as  a  parameter 

(  0) 

MOV  J7(GP),IJ1 

pul  par.Jmelcr  in  R1 

(  b) 

JGF?  PC, -■.(SP)4. 

jump  lo  subroutine  wbosc  .address 

is  on  stack 

(  (■>) 

MOV  ■<r'ttl60060,-(5P) 

save  value  m  RR2  on  the  slack 

(  7) 

MOV  <T'nl60066,-(SP) 

save  value  in  RR3  on  the  stack 

(  S) 

SUB  ii]2,SP 

allocate  5  words  on  the  slack 

(  9) 

MOV  wl?0,  (SP) 

pass  paraivieler  to  the  .SLINK 

(10) 

JSR  R5,SLINK 

jump  lo  subroutine  ^Lll'IK 

(11) 

MOV  (R{iU,PQ 

move  acfdress  of  the  (Online  lo  he 

called  to  RO 

(17) 

MOV  )mil60070,-(SP> 

save  the  value  of  RRO  on  the  stark 

( ]  3) 

MOV  frlPfO-t.-PH  160070 

tnove  ttie  new  value  into  RRO 

(10) 

JSR  PC,"-^R0 

jump  to  subroutine  whose  addres' 

in  RO 

(l'>) 

MOVfB  6(SP),R0 

vet  parameter  from  the  stack 

(16) 

BIC  h]7770O,RO 

clear  unwanted  bits 

(17) 

K'fOV  76332(RO),a''«  1 60060  ;movt'  new  value  into  RR2 

( 1 7.) 

MOV  6(5P),li'0 

jprepare  to  r«fti.irn  '-alue  of  the  lo 

^  RO 

(19) 

GIC  h70037,R0 

returned  value  in  RO 

(?0>  RTS  PC 

(21)  K^V  (5P)+,F>«  160070 

(??)  PIG  PI) 

(??A  MOV  P0,R1 
(70)  CLR  P2 

(2;<>  MOV  160060,1 2(SP) 
(76)  MOV  30(R1),10(SP)- 


(77)  MOV  10(SP),«‘5P 
(22;)  J5R  R5,. SLINK 
(?'))  MOV  (Rr))+,P0 

(30)  K/fOV  wff  160070, -(SP) 

(31)  KyOV  f?{Pr))+,»«  160070 

(37)  JSR  PC.n'RO 

(33)  K<-0VB  6(5P>,R0  j  and  so  on. 

(30)  BIC  « 177700, RO 

(36)  MOV  26332(R0)„)^«  160060 

(37)  MOV  6(SP>,R0 

(3f.)  BIC  «20037,R0 

(3<»)  R I  5  PC 

(00)  MOV  (SP)+,i?nn  160070 

(01)  RTS  RI) 


jrnturn  from  subroiiline 

ipop  back  v,3li.ie  of  RRO  from  slack 

;rntLiin  from  ilLlNK 

isomc  useful  work 

isame 

;(tiove  value  of  RR2  into  a  local 
;move  value  into  local 

:addiRss  30(P1)  is  07130,  wliicb  contaii-(s  i*76100 

;NrjTE;  address  02130  uses  new  value  in  RR2 

;move  local  to  lop  o(  stack 

;jump  to  SLINK 

;samfi  sequence  as  above; 
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6.1M1))  Iiv.lriit tion  Ti.ifc  (or  P( orf.'ior  3. 


;iK)l  relpvant 

;f.avt'  ttio  old  ''nluo  o1  on  thn  c.lar 
;cocle  to  (iod  tho  now  vnluo  of  fdf2 
:lond  the  iicw  value  m  1^R2 


()'))  MOV  I  6d06/? 


(1)  MOV  .fT'R3,l'1{R2) 

(2)  MOV 'M-n  16/106/1, (5P) 

(3)  MOV  ..fM?9,R2 

(4)  MOV  10<R2),?,-'«t]G/)06/J 


;ioad  the  i.aved  vali.K?  of  RR2 
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6.1.3  Sliidy  o(  memory  iiccc's  lypos 

On  the  I’DI'  ll,  loeinory  words  enn  be  accessed  in  one  of  (oik  accers  modes:  read, 
reaif  paii'.e,  v,/i  i(('  ancf  v/rde-by  le.  It  is  interesting  to  in-'c '..ligate  ttic  relative:  fr  eqnenc  y 
of  Itie  nee  of  Iho'  c  rnodes.  This  measurcincnt  is  pai  tiriilai  ly  ea:  y  wilh  K.mon,  since  it 
has  four  counters  whit  h  can  i)e  prograinined  to  count  the  occiii  rente'  of  these  four 
mode'..  I  irurn  6  0  presents  the  frequencies  of  tlie  use  of  Ihese  mccle s  on  cjiffc  ri’nt 
mode's  of  (’Df"- 1  1  cyecuting  the  t.amr;  program.  A  Pt^P- J  1  moriel  20  nr  ecj>.;  two  memory 
cytle  s  (  a  i  eati-paii'.e  followed  by  a  write)  to  v/nlc:  a  -r.li  e  m  it'.,  m.un  memory.  PDF'- 
1  1  model  40  ,  on  the  olhcir  hand,  needs  only'  lire  v/rile  cycle.  Bcdh  mririels,  howr  .'er, 
iir.f?  llie  read- pause  cycle  for  mslructions  like  increment  and  decrement.  The 
diffc'ience  in  the  use  of  tlie  reati-pause  cycle  is  quite  evident  m  the  fm.urc.  One  of  the 
rriain  uses  of  this  information  is  in  the  design  of  cache  memnnes.  C.mmp  can  benefit 
from  llie  introduction  of  a  caclto  memory  (or  each  proct'ssor.  However,  Ijccniiso  C.inmp 
is  a  iiiulli  -procc^ssor,  the  caelie  caii  contain  only  rirad  only  woi  rlc..  The  porcenteage  of 
react  cycles  is  thus  Jin  important  factor  in  ciotenmning  tlie  \',alup  (  i.e.  the*  hit  ratio)  of 


such  a  cache  meinory. 
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Wt  rKlo.uJ:  Mycira  o kcc ulmy,  >'"LAPC!I(  ;.f'f  '.cction  (vl.l), 

ri>i'  ( omits  an;  in.uio  over  ont'  second  dmation.  flir-  valiif-.  (cir 

ten  typic  ol  seconcic;  ore  provided  below  tor  the  two  |)i  ih c’ssorr.. 

(■K/mt;  (i.f)  (,i)  F^orr-ot  3  (PCP  1  ]/'10) 


Pc;  ad 

Rc'ad-fiaiicc 

iVrile 

W»  ile-byle 

Total 

pet  rent 

perc  rnt 

pen  ent 

pcM  ( rnt 

eye  lof. 

ais.b) 

2.83 

10.76 

0.80 

338217 

85.04 

2.70 

11.44 

O.Sl 

350160 

84,07 

2.h9 

11.53 

0,8) 

333073 

85. 4£ 

2.84 

1 0.S8 

0,80 

325587 

85.58 

2.70 

10.<S2 

0.80 

20)35.? 

85.50 

2  60 

10.04 

0.77 

350704 

85.1  1 

2.71 

1 1.37 

O.Sl 

355535 

85.14 

2.60 

1  )  .35 

0.80 

326851 

35.59 

2.70 

10.S2 

O.SO 

2S6232 

85.00 

2.72 

1 1.37 

0.81 

363085 

Mean  85.32 

2.745 

11.128 

0.801 

Fieuro  6.6  (b)  Ptoce 

ssor  0  (f’DP  11/20) 

Read 

Read-pause 

Vi/nte 

VJi  ite-bylc: 

Total 

percent 

pcM'cent 

percent 

pcircent 

eye  lev 

70.25 

9.1) 

1 1 .08 

0.56 

247756 

79.18 

9.12 

1  1.16 

0.54 

253720 

73.97 

9.25 

11.23 

0.56 

242460 

7S.60 

9.40 

11.47 

0.56 

24081? 

70.19 

9.12 

11.16 

0.53 

238226 

79.02 

9.19 

1  1.26 

0.52 

227607 

78.92 

9.24 

1 1.33 

0.51 

248309 

79.06 

9.18 

11.21 

0.53 

230657 

79.03 

9.22 

11.17 

0.58 

244516 

78.75 

9.32 

11.42 

0.50 

232767 

Wean  78.997 

9.215 

1I.24S 

0.536 

6.)  71  Comprt’honr.ivfl  iinibun  cycle  trace 

TIiig  ir.  one  of  the  traditional  fnearurornents  which  is  capable  of  answenns  many 
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c|i  If  •  I  lon;.  re  y.fTi  ding  Ihr  li,ii  f?  ;ii  rlnlf'cliiri;.  Sucli  (r.Tfr  h.ir.  been  dc'C  i  ibcci  by 
IjOifl-.rn  [  fX  I  ]  (or  the  lintvar  1108.  To  ['.alticr  i.iich  a  IrafG,  Iho  haidwaro  moiiilor 
bwe,  io  pci'.'..rM;  a  very  hi(;h  Imndwidth  oiilpiil  tifjvici?  i.ik  li  a;;  a  fixed  head  di  uin  or  a 
l.irnr  arnrjutit  of  corn  inemory.  Llnfortiinalely,  K.tnon  lacl'.s  eutli  fcalurt''..  and  is  thus  nol 
very  iist  feil  for  tins  trie asu?  ('ivif;nt.  It  is  however,  posr.iljilu  to  record  1  GO  consecuti'  p 
iinilnis  cyfitrs  l>s'forG  the  internal  buffers  iit  K.mon  overflow.  We  could  there'fore  [gather 
a  trace  cont.isling  of  windows  of  160  consecutive  cycles  which  coultf  still  provide  in.jny 
of  till  answers.  The  trace  was  post- processed  to  yield  l!if  insli  iic  tiivi  mix,  fi  er|iicnlly 
occurinp,  instruction  sequences,  frc'qucncy  of  use  of  all  coinbmalions  of  iiiode  anci 
rej’is'cr  pairs,  a  histogram  of  Ihe  number  of  memory  cycles  per  instruction,  a 
hislot’ram  of  index  values  off  the  stark  lordster  and  a  insloriam  of  imine cli.ite  mode 
opc;r ancis.  IBec aiir e  of  the  limilation  of  gathering  only  160  consecutive  cycles,  wc;  could 
not  sludy  the  branch  distance  behavior.  Appendix  C  |)resc-?nts  the  r(»sulls  of  this  study. 

It  '.hcu.ilcf  be  noted  that  it  was  not  possible  to  obfain  a  (race  consisting  of  more  than 
a  few  thousand  instructions.  We  could  nol  perform  this.  rneasurcMiient  on  mcie  than  a 
few  systeivis.  The  results  are  nol  very  general  and  do  nol  compare  very  well  with 
those  obtained  using  other  methods.  Tor  example,  appendi;:  C  displa)'s  the;  instruction 
mix  C'blainc’d  using  the  trace.  It  shows  that  Hie  instruction  'MOV'  was  used  only  16 
percent  of  the  time  in  the  traced  instructions.  This  does  nol  agree  with  the  value  of 
31.2  percent  oljiained  in  ciTapler  The  discrepancy  arises  becaiir.e,  appendix  C 
repot  Is  the  results  for  only  12000  instructions  from  one  specific  program.  T  he  rest  of 
appendix  C  should  also  lie  used  with  caution  clue  to  the  samr;  reason. 

r iilure  research  should  concentrate  on  obtaining  longer  traces  of  many  different 
programs  in  order  to  draw  meaningful  conclusions.  It  will  then  be  possible  to  ansv;cr 
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qiic'  tion',,  I  Iho  I’OI’  11  ;ii  clnO'c  lui  r  .mcl  il  .'.'ill  .il'.o  liclp  in 

tlic>  of  new  .'iK;liiloctt.ir(?':..  For  (?>;#)mf>ln,  it  i;;  inlei  i'' linf;  In  cii'lcnniiifi  Ihp  utility 

of  pr  oviriii^s  iiiinv.'di.jtf'  (iper.iiKlr,  (3  or  <1  bits  lonp,)  in  l-'OI'  1  1  inetruc  lionr..  Appc'iitiix  C 
ciinplivys  tu.iw  in.iny  mii.iII  iininc  diatc  modt?  opor.mdr.  nn?  n?rd  in  tlir  pronrnin  under 
study.  Similarly,  Iho  lnr,tof;ran'i  of  ifulev  values  off  Ihr  rfarl;  poinlor  prt".f;nts  tiow 
form.'l  paraiiif.terr.  passed  on  the  stack  arc  .accessed  by  the*  prcf^ram.  Ano'lier 
me  asi.it  niTiont  (  not  pritc-entcd  bore)  is  to  c?>:aiviino  liow  of  I  in  a  'MC'V'  instruction  uf.os 
addressing  mode  0  for  its  source  or  destination  opeiancfs.  In  all  sucb  eases  sinfdc 
Open  rind  l.OAl!)  or  STOIiK  insli  iic  lions  would  have  lif'cn  suflicienl.  Tbore  are;  nt.'iny  sue  li 
ffurslions  th.it  can  be  answered  once  the  traces  .ire  obtained.  Tiie  statistical 
eyperimeiit  prese.’ntocf  in  cbafiler  A  can  be  used  lo  qii.mlify  Ihe  vat  iance  of  tlie 
measnierl  quantities. 

6.2.  Opcralinp,  System  Dcr.i{;n  Level 

As  discussed  in  section  2.2,  a  hardware  monitor  is  a  useful  tool  for  mcasureivients  at 
this  le  .'cl  also.  Many  measureivients  can  be  performed  without  altering  the  operating 
system,  but  the  task  is  simplified  if  the  operating  system  can  be  modified  to  supply 
certain  haid-to  obtain  paraivielers  to  the  hardware  monilor.  \Vc'  will  present  only  three 
of  the  many  measurements  performed  using  K.mon  since  these  are  more  generally 
applit  able. 

1.  the  execution  profile 

2.  study  of  processor  hardware  priority  clianges 

3.  fiiiKfional  trace  of  the  operating  system 
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6.2.1  Th(?  excciilion  profile 

flxi'c ijfion  profile  refers  to  Hie  nieasuiiMiieiit  of  Hie  ficKiiicncy  of  rxrculioii  of 
efifferent  regions  in  a  prograoi.  The  lecliniqtie  used  is  to  sample  Hie  program  counter  of 
riindcm  and  output  ils  valuo.  This  generates  a  list  of  alisolule  addresses  winch  by  itself 
canncit  be  oa'-.ily  interpreted  by  operating  system  programmers.  This  list  is  tliereforc 
pcisl ■  proc essed  using  a  map  of  the  program  to  generate  the  execution  profile.  It  is 
then  possil.ili;  to  opIimi.TO  the  heavily  used  portions  of  the  program  or  to  alter  the 
algor  1 1  limn  used. 

There  are  two  prol.ilerns  thal  have  to  be  solved  liefore  such  a  measuiemcnt  becomr.s 
fc'iisihlr.  One  problem  is  at  the  input  level,  v/here  flic  h.irdware  rocnitor  bar  lo  select 
the  piogram  ccninter  values  belonging  to  the  operating  !.yslcm.  This  can  usually  he 
clone  by  using  address  comparators  to  isolate  Hie  operating  system  region  from  the 
machine's  address  spare.  The  operating  system  can  also  provide  this  information  to  the 
hardware  monitor  using  sornf;  signalling  meclianism.  The  second  problem  is  at  the 
post- processing  level,  v/hen  the  mapping  between  (be  operating  system's  routine 
namev.  and  their  alisolute  addresses  cannot  be  determined  because  portions  of  the 
operating  system  are  dynamically  relocatable  ancf  /  or  overla>cd.  Tliere  arc  no  general 
solulions  fo  Ihis  problem,  except  that  the  operafing  system  ran  be  programmed  to 
supply  the  nev/  overlay  number  v/hen  it  is  brought  in.  l  or  H)  drn,  v/e  look  at  Hie  value 
being  wrillen  into  RR^  (  see  section  6.1.2)  lo  cictermine  which  overlay  page  is  being 
used. 


14 


n  triip  ni-t  i»(rr<  himpu  of  fho  routinof*  in  fho  pro^mm  wifh  ^br^ohi^o  otljbonr«  r 
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7lv;'  i:x(:Ciilion  profjli?  for  lly<lr;i  ir;  r.liowo  in  .ippcnciix  U.  Dun  iiinflr.ui cmr nt  lielpt.  in 
clotidinj;  winch  rouliiutn  '..hovild  bo  in  Iho  i  ncidonl  pm  I  .md  '.'.'Inch  nlioitld  be  in  the 
0'/c:rl.iy  pari.  11  hnr.  nlno  bolped  in  r.clccliii;’  roulintn  'or  opIitiii.Tntion  £ind  for 
impInMirnt.ition  in  microcode.  II  can  be  teen  that  considerable  lime  ir.  bcinn  ;,pc>id  in 
llie  rnf’Jr.tcr  tavc^/  ri?t{orn  routine*;.  Tlirte  are  italiii  nl  candidnle;;  for  imr^lnmrnlfition  in 
microrodc:.  Onr  analyte:  program  also  provides  a  list  of  loolincs  ordered  accorefing  to 
Ihc  number  of  in«.f ruction  samples  falling  in  (liem.  II  can  be  seen  that  many  of  Ihc 
commonly  used  routines  (  e.g.  LNQ  ,  SEl.CTL  and  l?i‘0rPO  ii^  page  dd,  and  MKRN.C  in 
page?  23)  can  bo  mo\'ed  lo  tlic:  fixed  code  page  thereby  avoicliiig  changincf  every 
time  they  aie  called. 

6.2.2  Changes  in  ttic  processor  hardware)  priority 

Till*  t^DP-1 )  has  eiglif  processor  priority  levels.  Execution  at  any  of  llicse  levels  can 
only  be  interrupted  by  an  interrupt  occuring  at  a  higher  priority.  Hydra  ur.ris  a 
conve  ntion  that  Ihc!  user  proi’.fams  can  c'xcct'le  at  processor  Icvc-ls  from  0  through  3 
and  11)0  operating  system  executes  at  levels  d  |l»rough  7.  Different  devires  cause 
interujpts  at  different  levels  from  d  through  7.  Since  device  interrupts  have  lo  be 
serviced  within  a  short  lime  of  their  occurrence,  it  is  necessary  to  restrict  the 
operating  system  execution  at  high  priority  levels.  K.mon  war.  used  to  detect  high 
priority  executions  exceeding  a  certain  threshold  lime.  Since  processor  priority  is  not 
avail.-ble  as  a  signal  on  the  umbus,  special  probes  were  connected  lo  the  the  priority 
bits  in  the  processor.  K.mon  was  programmed  to  defect  Ihc  changes  in  priority  level 
and  record  llie  time  at  whicli  the  change  occurcci  and  the  address  of  Hie  first 
instruction  following  the  change.  The  supervisory  program  on  P.sup  detected  llic 
executions  exceeding  the  threshold. 
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SaiiipIfT  (ii.ilpiil  of  ffiir,  ('ypcriincnt  is  provided  below  lo  (iivirn  6.7.  The  .idclie:;.  of 
Ihr  insliuclion  (incli.iding  Ihe  p.ip.o  mimber  for  .^dclressf:;.  lo  the  o'.'c^l.ryf^cl  iiiflruf  lion 
p.iner.)  followinn  Iho  rir.c  in  the  priorily  level  in  given  no  thiil  cnireclive  .icfion  cr.n  be 
cliri'dc^d  'it  Ihe  proper  pl.icc  in  the  operating  synfem.  In  figuie  6.7,  all  the  high 
priorily  eveciilion  wan  canned  by  device  interrupts  and  no  wr  directly  report  thcr 
do\'it;o  cavjning  the  inten  tipt  in  ntcacf  of  giving  the*  aridri.’nn  of  Ihe  intci  rupt  roulit^c. 
Tire  following  me anoreiiient  war.  performed  to  detect  the  high  [rriorily  cvcciilion 
excfc  efing  one  millinccond. 


Figuio  6.7  Chal^gen  in  Processor  Hai  dware  f'tir.nly 
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6.2.3  Functional  tr.ico  of  an  opctratiiiR  system 

Th<^  purpose  of  tliir.  measurenient  was  lo  mear.ute  the  CPU  and  1/0  processing 

char.ictcri.stics  of  f^SXll-M  for  use  as  input  to  a  simulation  model  of  this  operating 
syslpm.  A  sot  of  ?najor  processing  functions  were  identified  for  use  in  the  modtni.  These 


were: 


Ic'i  minal  input  handling 
task  activation 

task  initiation  (  with  and  without  checkpointing) 


15 


Thi-,:  ov'iMMimt nf  was  pv*fotwv<i  /wifiMy  by  Dermol  Dr^din  of  Oifilnf  Hiimpf'-'nl  Corp  fhr  ftufhor 
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liir.K  evi.'(ul(Oii  (  a  Fcirlran  prop.rniu  v/ilh  rubroulino  rail'-.,  cli'',!'.  I/O  and  an 
overlay  '.tr  m  luro) 
lank  Ifimifintion 
terminal  oiilpul  Inandllnj; 

A  number  of  places  in  the  operating  system  code  were  then  identifieef  so  tlial  a  trace 
of  rrxr.ciition  at  IlM'se  placor-  in  Mitficicnl  to  give  Hie  time  tcquiied  to  perforin  the 
above  femetionn.  One  word  in  the  operatino  nyslem's  data  art:a  wan  designated  an  the? 
'hook'  word  and  code  wan  introduced  at  lliere  placer  to  write  a  value  m  Ihe  Ivook 
location  to  uniquely  identify  the  place.  K.mon  was  net  up  to  delect  any  'write* 
operation  into  tlie  liool;  word  ...  d  record  Ihe  value  wnllen  and  the  limn  stamp.  All  the 
command;:  given  to  the  dinks  w-ere  also  mondoied  ncpai.dely  by  Ir.^cinr;  all  Ibe  'write* 
operations  into  the  device  rogistern  (  nee  section  6.b  for  another  rvperimeiit 
performed  by  monitoring  tlie  device  registers).  The  output  of  the  monitor  vjau  then 
analy;;ed  fo  giver  a  compiric  trace  of  .activities  of  Die  operating  system  and  the  dini  s.  It 
war.  necessary  to  restrict  the  execution  tn  a  single  user  exrculinn  a  i.imple  program  m 
order  to  interpret  Ihe  trace  successfully.  A  small  pari  of  Hie  Irace  is  included  in 
Appendix  II.  Ihe  inform.alion  obtained  wilh  this  mnanurement  can  al:.o  be  obtained 
tracing  the  events  in  software.  The  only  reason  K.mcn  was  used  for  Ihis  rneanuremenf 
was  to  introduce  negligible  porturl>ation  in  the  operation  of  the  operating  system. 

G.3.  Systems  Procrninming  Level 

Thi.n  lev'cl  includes  Iho  compilers  and  their  riin-timr:  sycdc'ins,  the  utility  programs 
and  the  file  systems.  The  execution  profile  (  see  previous  section)  is  again  the  most 
important  parameter  at  this  level.  This  lev<;l  is  characterised  by  a  strong  interaction 
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v.'illi  the  opcr.itinp,  '.yc;lcm.  H  is  Ihcrcforc  intcrostinp,  lo  tvionilor  tlic  service  coll:,  to 
the  ciporntinp  system.  On  the*  PDP-11,  a  service  call  is  iniliated  by  cxeculing  a  special 
instruction  (  I'.MT  or  T(?AP)  and  the  operating  system  returns  lo  the  user  also  using  a 
speci.ll  instruction  (RTl  or  PTT).  K.mon  was  programmed  to  monitor  tiie  execution  of 
these  insituctions  and  r<‘cortJ  the  type  of  call  and  the  time  stamp.  This  yi(^ld^.  the 
ftdcpiency  of  use  of  the  different  service  calls  and  the  histogram  of  the  e>.r?c:ution  time 
for  each.  In  some  car.es,  the  service  calls  to  llic  operating  system  consume  a  c.ignificant 
amount  of  the  total  execution  time  of  a  systems  program.  It  is  sometimes  possil.ile  to 
alter  an  alnorilhm  to  eliminate  certain  service  call:;  and  hence  a  mcasuiement  was 
dc'.;i[>,ned  to  give*  the  total  time  consumed  by  each  call  in  a  Fortran  compiler. 

Table  6.c?.  lists  the  instruction  addresses  in  the  Fortran  Compiler  from  where  an 
oporaling  i.yslcm  sc'rvitc  call  is  made.  An  instruction  addiesr.  is  uiuriuf  ly  idenlifieci  by 
the  pair  (overlay  number,  address).  For  each  instruction  addre.ss,  the  maximum  lime 
spent  in  the  operating  system  lo  complete  the  call  from  that  adcire.ss  i.s  givern  along 
wilh  the  total  time  spent  in  the  operating  sysle'm  due  to  a  call  at  that  address.  The 
total  number  of  times  a  call  at  a  particular  address  is  executed  is  also  givern  lo  guide 
the*  optimisation  of  compiler  algorithms. 

Till?  measurcMTients  were  made  for  the  entire  duration  of  the  compilation  (or  a  typical 
user  program.  The  Fortran  compiler  was  the  only  program  running  on  the  computer 
when  the  measurements  were  nuade. 


OO  OOOOOOCM 
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ri(;ui  f?  6.8  Tiinr,'  Conruincd  by  Calls  (lom  a  rrji  ltan  Compiler 


Total  litnc  of  ntoar.ur emr nf;  ?)C>rf6  milliseconds 

Total  time  inside  the  compiler:  H)  1  milliseconds  -  42.1  percent 
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Flit;  s'/slrm;  Wail  for  I/O  completion 

0 
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9 
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0 

27272 
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39 

Assign  logical  unil  numl.-cr 
(  simil.^r  lo  channel  number) 

0 

27406 
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403  23 

40 

Assign  logical  unit  number 

0 

1  7024 
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3565 

13 

Get  task  p.irametcrr. 

0 

17032 
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S7J1 

13 

Get  ta.sti  parameters 

0 

17114 
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7341 

13 

Set  software  trap  vectors 

0 
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13 
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14 

Assign  logical  unit  number 
(  for  the  file  system) 

2 

13754 

1202 

14031 

14 

Assign  logical  unit  number 
(  for  the  file  syslem) 

This  me asurnment  can  also  bo  conducted  in  software,  bid  a  hardw.ire  monitor  can 
gather  this  information  independent  of  the  operating  system  as  long  as  the  same 

f 

instructions  are  used  for  call  oniry  and  exit. 

In  order  to  help  reduce  the  overall  lime  required  for  compilation,  the  following 
e/periment  was  performed.  K.mon  was  set  up  to  monitor  Ihrco  slates  while  the 

compiler  was  run  stancl-alonr  on  the  machine: 

1.  Compiler  execution 

2.  Operating  system  execution 

3.  Wad  for  a  device  (usually  a  disK)  to  complete  Iransfer 


107 


I  li>>  niilpiil  o(  the  c'ypcriitu.'iit  incli.Kicr.  Iho  percent of  tone  '  pei^t  in  c.icli  of  the 
lliirf-  Molc'u.  A  forgo  woiling  lime  indic-ilon  poor  o>/c:rl.?p  '.Irucluio  of  Ihr  compiler, 
fiven  though  in  a  inulli-programming  r-yotem,  a  l.irgo  vrailii'g  lime  does  not  nececrarily 
c.HJoe  decr(?asiTel  Ihroughpul,  it  cIope  increase  the  rpcponct:  time  cxporicnccdby  the 
Lic.err.,  ThiE  mear.utement  can  bo  used  continually  to  quantify  the  i(npr()vc:riif;nl  {  or 
otherwise)  in  the  oyeculion  of  the  syslciTiS  program  as  rnodifiralionr.  ai  e  made  in  the 
program  or  in  the  operating  system  atgorilliivin.  Fipure  6.0  piesents  out  results  for  Iho 
t’orlran  fornpiler.  It  can  be  sceit  lliat  the  compiler  evltibils  a  poor  overlap  structute 
spemeting  ovcM"  a  third  of  llie  elapsed  time  waitiiig  for  the  I/O  completion. 

Figure  6.9  Fortran  Compiler:  Overlap  Structure 

Total  time  of  measurc'rrient;  33380  milliEeconds 
Total  timf!  in  tlie  User  state!  millisecondr.  "  h'l.bK  perrcnl 

Total  time  in  the  non-UEcr  slate:  15007  milliEcconds  '=  dd.96  percent 
Breakdown  on  non-UGer  time; 

Total  I/O  wail  time:  milliseconds  -  3d.5S  pc'rcent 

Total  operating  system  overhead:  1723  milliseroncis  =*  5.16  percent 

Total  file  system  overhead:  1619  milliseconds  -  d.S7  percent 

6.4.  Applications  Prograinniino,  Level 

K.mon  is  designed  for  PDP-1),  wliich  in  a  mini-computer  ami  consequently,  we  were 
not  aljle  to  apply  it  to  any  large  installation  supporting  many  applications  programs. 
Wc  therefore  have  limited  evperience  in  the  applicability  of  a  baidware  monitor  at  thio 
le\'(;l.  Fiisiruclion  execution  profile  remains  Ihe  most  imporlarif  parameter  even  at  this 
level.  The  problems  mentioned  in  Gcction  6.2.1  arc  compounded  by  llic  fad  that  Iho 
execulion  profile  at  Ihc  high  level  language  slalemenf  level  is  required  by  the 
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applii  ationr.  programmerv.  ai><|  thin  is  very  ditficull  to  obt^^il^  ii:;it>{j  a  liaidwaro  monitor. 
Tliere  arc  many  ways  to  get  around  theno  problems.  In  our  opinion,  the  use  of  a 
hardware  monitor  for  thin  mea-suroment  become?  clumsy  and  not  {jcncrally  applicable. 
The  bent  solution  will  be  to  provide  the  necessary  facilities  in  the  compilcrr.  and 
operating  sysfemn. 

6.5.  Installation  Management  Level 

As  mentioned  in  the  previous  section,  K.mon  has  not  been  applied  to  any  large 
install-stion  supporling  many  users,  for  which  install.afion  managoment  is  necessary.  The 
main  performance  parameter  at  thin  level  is  the  equipment  utilization  and  overlap.  We 
therefore  efesigned  an  experiment  to  measure  the  overlap  belwcen  the  processor  and 
any  device  (  we  have  restricted  our  attention  to  dink.?  and  drums  only).  VJo  did  not 
consider  number  of  jobs  per  day  or  the  average  CPU  ujilizalion  as  meaningful 
mear.i.iroments  with  K.mon. 

Since  devices  are  not  accessed  through  channels  on  the  PDP-11,  we  experienced 
one  .simplification  and  one  problem.  The  siniplification  in  that  separate  probes  are  not 
needed  to  monitor  the  channel  activity.  Tlie  device  registers  are  accessed  through  the 
unibus  and  so  fliey  can  be  monitored  with  the  address  comparators  in  K.mon.  The 
problem  arises  in  the  definition  of  'overlap*.  In  conventional  machines,  the  channel  busy 
and  processor  busy  signals  can  be  AND’cd  together  to  detect  overlap.  We  defined 
overlap  as  the  number  of  processor  cycles  between  issuing  the  start  read/wrile 
command  to  a  device  and  receiving  the  completion  interrupt  from  the  device.  Clearly,  if 
the  processor  executes  a  WAIT  inslrucfion  immediately  alter  giving  the  start  command 
to  the  device,  the  overlap  will  bn  zero. 
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K.nion  io  ;;cl  up  as  follows; 

Kvcnt  0;  1  mici  osecond  clock 

Event  1:  'write;'  itito  any  device  legistcr.  Wlion  fjelecleri,  oiilpuls  litne  statnp, 
revjfiter  addri’ss,  value  Ixung  written,  values  of  counters  3  aticl  d. 
Event  2:  fetch  of  an  iiderrupt  vector.  Vk'hen  detected,  outputs  lime  stamp, 
iiiternipt  vc;rlor  addressf  this  idcinlifit*'..  the  device  causinf,  the 
interrupt),  values  of  counters  3  and  d. 

Eveent  3:  counts  in  counter  3  the  niifiibor  of  unihiis  cycles  initiated  by  the 
processor. 

Event  d;  coutits  in  counter  d  the  nutnber  of  all  unibiis  cycles. 

The  trace  is  analysed  by  a  program  which  interprets  the  coirirnands  givc'n  to  the 
various  devices  anci  effectively  reconstructs  the  device  activity.  It  can,  tor  evnreple, 

find  ouf  \vliich  cylinder  of  a  difik  was  accessed.  The  previous  position  of  the  disk  arm 

« 

is  available’  to  the  analysis  program  from  its  interpretation  of  the  previous  command, 
licjncc  it  can  (l(!tf.'riiiino  the  magnilude  of  the  dink  aim  movement  for  every  command.  It 
can  also  determine  the  number  of  words  transfered  with  every  command,  Ihe  time 
tal'.en  to  complete  seeks  and  transfers  and  the  ulili.Talion  of  Ihe  difleicnl  units  of  a 
device.  Overlap  in  Ihe  difference  between  the  values  of  courricr  3  between  event  1 
('go'  command  being  given  to  the  device)  and  the  following  evernt  2  (  interrupt).  Wherr 
a  device  receivers  a  read/wrilc;  command,  it  cannot  accept  any  other  command  until  the 
transfer  initiated  by  the  first  command  is  complete.  So  the?  utilization  of  a  dtrvice  f 
similar  fo  the  utilizalion  of  a  channel)  can  be  defined  as  Ihe  total  time  a  drrvice  was 
busy  clivirfod  by  the  total  lime  of  measurement. 

Appendix  F  contains  a  sample  output  of  this  experiment.  Because  of  the  Inw  output 


b.ii'^d’vuJth  of  K.II10I1,  flip  Irace  conr.ists.  of  many  windov/;;  from  flic  actual  oxoc iilion  of 
llu:  '  y'  lc'hi,  riii'.,  rr  Id  liavt’  bc'pii  eivoirlod  by  uf.inc;  a  Irylx  iii  •..rli'‘mp  wlipro  Hip  liininj’ 
and  Q'.'trlap  data  v/eru  obtained  with  K.mbn  and  Ihe  temainins  data  obtaincid  by 
inc.cjrlint;  suitable  rric- ar.ui emenf  code  in  the  operafinj;  system.  It  is  however  advisable 
tej  |)t  rform  the  eypt'riinent  without  modifyins  the  operatin';  system,  since  then  the 
same  experiment  can  then  be  used  for  tneasuremenfs  on  any  operating  system. 


7.  Conclusions  iind  Further  Rcseorch 


In  this  «)t lon  W('  linvc  roi>( c  iitratcci  on  th.c  inr or.ni ('mr;nt  rind  analyim;  pioi)lcni 

of  ccrnputcr  ‘.y^toivir,  at  tli(;  hanfv.'r'ito  ai  clnl'-ctnrc  and  'Ik-  opt'irdin^  kernel 

dc' In  chapter  2  v.'c  pc  .mipd  Iho  pcrforntanc  c  pai  .niictc  rr.  at  v.irioiir.  ryr.tciri 
atHi  disc ur.r.ed  the  apidicable  tnear.uioiv.cnt  toolr..  ^)ll^rc  riuc  iiilcrcct  lie;,  in  the 
phenomena  co^'crinj;  the  ranne  of  a  (e'Ai  in;.truclion&,  the  most  appropriate 
rncoM.irirmcnt  loot  for  our  purpose  was  a  hardware  rncnilor.  A  hardware  monitor, 
ho-.'.T''<.'r,  is  a  versatile  tool  applicable  to  other  sysleni  levels  ris  .veil,  so  sonic  effort 
was  devoted  in  studying  Hie  Ivai  dware  monitor  me  Ircimiques  in  ^;ener3l.  A  brief 
closer  ipihon  of  our  haiciwaie  monitor  K.inon  was  prose  nted  in  chapter  3. 

Cliaplor  A  cliscusJrCd  a  major  experiment  ciesij'ncd  to  addre;;-  the  c|ucsfion  of  the 
vari.sbilif '/  of  the  instruction  mix.  The  expernment  veas  desif.nrd  to  cjuaivtify  the 
varinnee  caused  by  3  factors  and  to  enable  comparisrjn  of  their  effects.  Application  of 
statistical  experimental  dosiRn  methodolovsies  is  relatively  ivew  iii  tlve  field  of 
performance  evaluation.  We  liope  our  success  will  Irip.ger  more  interest  in  the 
scictilific  clesip,n  of  experiments  in  this  field.  Our  meosut i.>ments  mdiralr.'  that  a 
statistically  sij’nificant  variation  in  the  instruction  mix  is  cauc.cd  both  by  the  application 
area  and  the  program  in  a  given  area  but  not  by  the  clidcrent  phases  of  execution  in 
the  same  program.  It  is  therefore  not  advisable  to  attempt  to  ov'er-optimi^e  a 
processor  for  a  parlirular  application  area.  We  also  quantified  an  intuitively  well 
iiiulni  stood  fart  that  all  the  addressing  modes  of  the  PDI’-l  I  are  not  equally  useful. 
For  double  operand  instruc lions  ,  mode  5  (  aulo-clecremrnt  cleferri'cl  )  is  almost  never 
used.  Many  of  the  instructions  were  also  shown  to  bo  seddom  used.  Tbe;.e  results  are 
imporlanf  for  Iho  design,  implementation  or  emulation  of  PDP-1!  or  similar  processors. 


too,  nr.  in  .iny  ollior  inquiry,  .incvycis  to  onr  ;  r  t  of  qLir-.lionc.  givo  ri;.(?  to  iiow 
qi ipT.t lonr.,  I'oi'  r/.iiiiplc,  tv.'o  of  oui  a)>plic.--)tion  aina?.  ii'c  liip.h  Invpl  lanp.MHgec  (  Foi  'fan 
line)  I'obol).  It  would  ho  intoin'tino  to  inv'c<:ti{’ato  thr  variance  (iiio  to  tho  uvo  of 
riiffcirnt  <  otnf'ilr.  rr,.  In  aui  :,ludy,  wp  ti.ivp  inncjroc)  IhiG  effect  by  rir.‘'.u(iiios  that  c.iinil.ir 
fnarhino  iiv-triif  tionc,  will  bp  uercl  to  arconipli'-.h  tlie  typo  and  amount  of  real  'wort'.' 
Irpina  dpi/ianctpd  by  a  levc  I  lanouafte  edatoment,  indeponcic.'nl  of  the  compiler  need. 
But  (tiin  ai.ei.irnplion  certainly  neede  to  be  iin'fctisated.  II  will  alr.o  tic:  intereslii'p,  to 
pc'rfcu'm  .1  c.imil.ir  eypc’rimcnf  on  a  l,irp,cr  innchinc  for  v.'hich  Coljol  rompiler;  ha'/e  Ijpcn 
a'/aiL  blp  for  innny  ycarr.  niicf  wliicli  poeoesc  Colsol  r-pecific  in.'sc liinr  inetruclione. 

It  ic  intereetinr  to  look  at  flic  whole  eypcrimcnf  in  the  (injif  of  the  worl.loari 
r bar.TC tcriaal ion  problem.  InUiil ivpiy  the  differenl  ai'plirabon  arear.  represent  different 
wortdoade  cince  it  ie  clear  that  eac.li  of  thece  arnac;  ir.  cfoine  a  different  kinef  of  'worl;*. 
The  iorlran  pro^ramr.  arc'  in.inipulating  numbers  for  llic  purpose  of  snlvinr;  cqualions, 
the  operalit>e  syetemr.  are  performing  the  processor  and  memory  scheduling  functions 
whornar.  the  real  lime  systems  are  responding  to  the  cve-nts  happeiming  in  their 
envii  nnments.  Our  experincent  is  an  attempt  to  characteri.re  these  ititoilivc;ly  etitferent 
work, loads  in  terms  of  their  instruction  mixes,  it  turns  out  lliat  a  mpianingful 
characterisation  at  such  a  low  level  is  not  possible  due  to  the  varmtion  in  the 
programs  belonging  to  these  areas.  This  negative  result  should  not  he  interprccted  as 
saying  that  a  c liarnctcriaalion  at  a  higher  level  is  not  possible;  in  fart,  fiiturt!  rcsearrh 
shoulcJ  concentrate  on  the  next  higher  level  of  atomic  'work'  e.g.  in  terms  of 
manipulations  of  higher  level  data  structures  like  vectors,  lists,  process  control  I'clocks, 
aiicl  stririgs.  Chapter  b  analysed  the  problem  ol  software  lockout  for  multi-processor 
operating  syctems.  In  order  to  maintain  syedem  integrity,  certain  '.hared  objects  have 


lo  lic’  .T(  by  only  one  proc  cs'^or  at  a  lour;.  Siir  h  a  inijti.i.il  o  ;;c  lii'.ion  pivc'.  i  icr,  to 

crilicil  f.rrfionr.  of  ( oclo  wliicl'i  ran  bt*  CKCCiilod  by  only  one  proc  or  at  a  tone.  Tbit, 
iceultt.  in  a  lot.;,  of  tiinr-  if  a  proccv.t.or  ban  to  wait  to  accc  r.t.  a  i.liai  ircl  d.ita  object  until 
it  I.h;<  omr; v.  ffoo.  Wt‘  ‘..bowed  thal  in  llycira,  only  tiiino  pai  aiia  torr.  ronliol  Iho  tiinc  loet 
clue  lo  f.ofiv/aro  lockout:  Ibo  a'/erane  lenotbr.  of  the  ciiticr.l  tienr.,  Ihn  l  olalivc; 
f I  I'ciuencicti  of  us.c-  of  the  various  shared  data  objects  and  llie  ni.'rabor  of  processors  in 
the  r-vstem.  In  other  '..yt.lcms,  the  critical  sections  inipht  be  nested  inside  one  another. 
A  ciifk'rent  model  will  have  lo  be  evolved  in  these  eases.  VVe  oliscrs'c-d  thal  Hydra  has 
been  cpiilc  successful  wilh  respect  lo  the  soilwaie  loci. out  problem;  less  than  1 
petrrent  of  time  was  lost  due  lo  lockout  e''c;n  for  a  fair!'/  busy  systeivi.  Hydra  contain;; 
many  shared  objects  but  the  critical  section  times  have  br>en  keirt  small  rcsulliiiv  in 
smaller  lost  lime.  It  mipht  be  intcr(!5..1inE  to  consider  other  clc’sivns  where  (ewer  and 
larger  critical  scictions  are  used  wilh  (perhaps)  some  savinn  in  complexity  or  time 
without  payin';  loo  much  penally  in  lost  time. 

Ch.ipter  6  discussed  other  applications  of  K.mon,  K.mon's  limitations  become 
apparent  in  the  study  of  the  memory  contention  problem  and  also  in  ol.ilaining  a 
complete'  memory  cycle  trace,  rulure  hardware  monitors  should  have  provisions  (or 
ine.ir.i Hint;  memory  cycle  limes  and  for  Iracinp,  each  machine  cycle.  It  ran  be  seen, 
however,  thal  a  hardware  monitor  is  applicable  at  all  the  system  levels  and  this  fact 
should  be  kept  in  mind  when  designing  future  harefware  monitors. 
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Appendix  A:  Survey  of  Hrirdv/sre  Moniloring  Ted'iniques 

A.l.  Introduction 

A  iiot  of  p(!i  farm.4iKe  parniMctpr;;  af  varioiir.  f.y;:leii)  .iihI  the  vai  iour. 

cxp(!i  inif; titr.  pcrforincd  using  hnr(iwari;  irionilorr.  to  gaihrr  Ihcse  paraoinlcrr.  v/c;ro 
disci)'. seci  in  the  previous  chapter.  This  chapter  surveys  llie  haicfivaro  nioniloiDig 
tt'c iiniqiies  that  liave  been  used  i)i  tlio  past  to  perform  tlicso  rncasut ernents.  The  study 
of  hardware  monilorinp,  r/;ilcritr,  is  l}raknn  down  ailo  Ih)  no  ciiivicnsions; 

1.  the  event  detection  tucchamsin 

2.  the  event  response  specification 

3.  the  display  rnec hanism. 

Hardware  monitoring  systems  from  the  initial  IBM  7090  monitors  to  the  current  stale 
of  the  art  monitors  are  discussed  along  these  dimc'iisions. 

PcM'formancc  monitors  for  cornputer  systems  are  available  in  many  torms,  often 
designed  to  measure  very  diffc'rent  parameters  of  an  operational  system.  Performance 
monitors  can  be  broadly  classified  into  two  types.  Fiisl,  the  hardware  monitors  capable 
of  sensing  bits  and  words  of  the  computer  system's  statu.s.  Second,  the  software 
monitors  capable  of  interrogating  software  structures  such  as  queues  and  job  tables  . 
Store  recently  hybrid  monitors  have  been  used.  These  are  hardware  monitors  assisted 

e 

by  sC'flv/are  on  the  measured  system  to  obtain  information  wtiich  is  not  available  or  is 
difticull  to  get  for  a  pure  hardware  monitor. 

Various  computer  professionals  have  different  masons  tor  initiating  mcasuM'ment  of 
a  computer  systcirn.  These  include  gaining  unctersfanding  of  the  dynamic  beliavior  of  a 


system,  observation  and  pr(?cliction  of  the  effects  of  hardware  and  software  changc-s. 
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obUiinint.’  p.it  .tinctcr;.  for  .innlylir  or  5;iiriul.»tioii  niodcllinv;  .-fnci  inoclf.’l  valirliit lOn.  Tlio 
pc'i  fcirin.iiHC  pni  .iiviolcrr.  nt  vnriouf.  IcvkIi.  in  a  <oiiif,nilc'r  cii  n  alr.o  cfifloroit. 

I  iCLii  i;  2.1  altcmptfi  1o  tliai  actorix.o  (ho  poiformruKe  pai  ainolfir  n(  vai  ioui.  '..yJcm 
ICN'cli.  ani;l  tho  inor.t  valujible  porformaiue  tool  for  c<i(h  l(.>vcl. 

liii\cc;  the  porfnrro.oKe  p.irarriolprr.  at  any  Icvol  dt^poiul  on  llto  parairiftorr.  of  ln\'(:lr. 
bolow  it,  it  it;  po'jsiljli?  to  moasuro  paraivietorr.  at  any  li;Vol  iifiinj;  tonic  moct  >ipf>lical?le 
to  lower  levelc.  Tlie  harHwaro  monitor  theroforo,  in  a  ocefol  nicar.uroirient  tool  at  all 
;;y;.tom  Icvelr.  and  an  indicpcnr.abie  tool  at  the  hartf'-varo  nrrbitccliiri?  and  opcratiiig 
ryctorn  Keriiol  dL’ci{'n  level.  Earlier  hardware  tnonitorc  v/orc  loctrictcd  to  low  cyctorn 
level:-  but  mod  new  c omnicrrial  monilorr.  are  geared  lovraid:.  the  inc.lallation  inanfigc-r 
and  applications  programmer  levels.  This  fact  should  be  kept  in  mind  when  c ornpariiig 
pafit  and  present  tornmcrcial  hardware  monitors. 

A. 2.  Fi^nclional  Components  of  a  Hardware  Monitor 

TIv  haidware  monitors  have  evolved  over  time  from  simple  summary  clevi(  t?s  for  tbn 
•IBK^  '-’090  tlirough  plug  Ijoard  monitors  to  today's  programmable  monitors  driven  by  a 
mini-romputer.  Central  to  all  Ihese  monitors,  however,  in  tlie  concept  of  an  ’('vent'.  An 
event  is  the  occurrence  of  a  particular  stale  on  the  system  under  measurement.  The 
event  we  are  interested  in  can  also  be  a  combination  or  a  sequence  of  other  events. 
An  event  can  bo  as  simple  as  an  occurrence  of  an  instruction  fedeh  cycle  or  ns  compto: 
as  the  first  operand  fetch  cycle  after  executing  tlie  instruction  at  a  certain  location 
while  executing  a  particular  user’s  program.  Some  monitors  sample  a  slowly  varying 
input  signal  for  bring  true  or  false  at  a  certain  sampling  rate.  The  occurrence  of  the 
sampling  pulse  c  an  hr.*  consiilerrd  an  event  for  the  purpose  of  our  discussion. 


122 


r>iriff>  rvcnl'.  can  bp  r.o  coiup'lrir  nnd  ‘.imr  diflciptil  rvcnl'.  nfpcl  lo  be  inpnilorpcl  for 

I 

cliflc'K’ht  py.p('rimf;nt!:.,  tbn  event  ciptcclion  inccbani<.in  ir.  Ibi;  tiicr  t  important  part  of  a 
harctwoK?  moi^ilor.  After  cletectiirs  an  event,  the  monitor  can  ju'.t  ituiemcnt  .)  counter 
or  iipcfate  a  liicloRram,  or  lime  jlamp  the  ev'cnit  and  ;:|ore  it  in  the  secondary  slorat’e. 
In  llipory,  it  ir,  possible  lo  store  every  event  wilb  its  time  stamp  and  all  the  other 
information  and  later  process  the  whole  data.  It  is  howen/er  much  rncre  economical  lo 
do  seme  selection  in  the  monitor  itself  and  restrict  lire  flow  of  data  to  only  perliirent 
iirformation  or  some  condensed  form  of  the  complete  information.  Oir  the  other  hand,  if 
only  the  condensed  form  of  iirformation  ir.  obtainocl  from  llie  hardware  iironilor,  its 
utility  is  limited  to  pross  measurements.  Fvenf  response  specification  ir,  therefore 
another  important  aspect  of  a  hardware  monilorins  system,  f  inally,  Ihc  vathrred  data 
Iras  to  h(‘  presented  in  an  iiird(?rstandahle  form  for  tiro  users.  Some  of  the  data  can  he 
precc  nted  while  the  data  collection  is  going  on,  some  Iras  to  be  post-procesoed  by  a 
computer  and  some  data  needs  to  be  stored  lo  buifd  a  data-base.  Ilov/  the  data  is 
presented  is  important  for  the  monitoring  system  lo  gviin  acceptance. 

Before  we  discuss  the  characteristics  of  existing  hardware  monitoring  systems,  lot 
US  briefly  enumerate  some  of  the  experiments  that  can  be  performed  using  Irardw.ri  e 
nronitors.  The  experiments  span  measurenrents  at  all  system  levels  shown  in  Figure 
A.l.  Some  of  the  pcrformanccf  paiameterr.  at  the  upper  levels  are  best  obtained  using 
a  software  monitor  gvcmi  though  hardware  monitors  have  been  used  to  obtain  these. 
We  have  restricted  our  experiments  to  those  that  need  lo  use  a  hardware  monitor  for 
any  cf  the  following  reasons. 

1.  Events  at  machine  cycle  level  are  not  accessible  to  a  software  monitor,  e.g. 
overlap  of  1/0  and  CPU,  cache  hits. 
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2.  5c)flw»)r(;  tnonilors  often  eiopciidonf  on  Hip  opcriitiiip  system.  TIud  tii.ikp';  it 
ncete.r.ai  y  to  program  f.cpni  atc  monitor;-,  for  o.trl;  op(M-,iliiig  ry;.tc'm  c.v'pn  v.'hpn 
tlie  opcr.iling  ;;y;.tcmt,  aro  v/rilten  for  tlm  vnmc  fii.irliino.  Moreover,  eome 
i.oflwaro  monitorD  require  llic  langu.-;gc  tomp-'err.  to  l)C  itioclifiecl. 

3.  An  artifact  or  perturbation  in  introtluccd  in  tlie  meanuied  nynlem  v.'ilh  any 
noftwaro  inear.uremcnt  trehnique.  Thin  arlifart  cannot  be  ignorref  in  nome 
important  experimenln,  c.g.  mctonurnment;.  on  time  ciilic.il  operating  nycicm 
flint  tions. 

4.  Some;  meancirementn  are  prohil>ilivo;ly  expt;nr.ive  if  poiformtd  in  noftevare,  c.g. 
counting  inntrutlion  usage  or  accessen  to  an  active  data  ntructui  e. 

5.  Tlie  hardware  monitor  possesses  high  speed  counters  .and  a  high  resolution  timer 
williout  wliicli  most  counting  and  liming  measurements  are  imponsilile. 

The  applicable  reasons  out  of  these  are  indicated  for  eatli  experiment  in  the  figure. 


ixperimsnts  Performed  Using  Hardv/sre  Monitors 


Figure  A.l(  continued  ) 


applicab!?  only  ‘o  maj'rrnes  having,  a  re!cca‘:o^  re-^star 
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A  2.1  Ev(>nt  Huticction 

1  hi^  lo  a  hardwatp  l11nl^ilo^  iu  the  vaiioiit.  t.tatiii.  hu!-  line;';  and  in 

tin*  '..ystem  under  mr; ar.ijM‘nif;nt  (  tailed  P.ltost).  A  hardv.-are  rtiunilor  utually  pa;:cively 
i.en^ct  tile  I'akies  of  the  input  j.ip,nals.  Hi(;h  impodance  prober,  are  provided  r.o  that  a 
very  '..ir>all  load  ir.  put  on  the  r.ignal  under  mear.uroiTient  by  connecting  the  probe  to  it. 
Tor  a  general  purpose  monitor,  the  prober,  need  to  be  ralibraled  for  the  tpecific 
voltap.e  valuoo  in  the  P.liont.  There  is  alto  a  problem  of  ;.ynchroni;:ation  r.incc  the 
prol-ie  input  ir.  not  valid  U'hen  the  corretponding  r.ignnl  ir.  changing  itr.  f  tate. 

In  the  early  monitorr.,  event  detection  wa.',  achieved  wilh  the  help  of  a  logic  plug 
board  that  consisted  of  an  assorted  collection  of  gates,  latches  and  cletoderr..  It  was 
therf'foiT  very  time  consuming  to  set  up  an  ('ypci  imr.  nt  or  to  ;  witch  fiom  one 
evperiment  to  another,  A  looK  at  figure  A.l  will  sliow  that  one  of  the  most  important 
partr.  of  event  dcdection  is  address  comparison.  It  is  necessary  to  be  al>le  to  cietect  the 
occurrence  of  a  specific  address  or  any  address  in  a  given  range.  It  ir.  also  important 
to  detect  pailicular  values  of  data  being  accessed  or  those  of  otlier  probe  inputs.  Ktosf 
modern  monitors  are  programmable,  that  is,  the  function  of  the  plug  board  is  acliieved 
by  setting  bits  and  registers  in  the  monitor  under  the  control  of  the  supervisory 
comptder  (  called  P.sup).  Ttie  hardware  monitor  designed  and  built  at  Carnegie-K^.ellon 
University  is  a  programmable  monitor  {  called  K.mon)  .  A  brief  description  of  K.mon 
event  dnteclion  is  given  below  as  an  example  of  programmaliie  monitors. 

The  evc^nt  detecting  part  of  K.mon  is  sliov/n  in  Figure  A. 2. 

The  event  detector  senses  events  at  the  unibus  cydef  i.e.  memory  fetch)  lex'el.  It  is 
composed  of  two  types  of  modules,  compar.itors  and  bil  masks.  A  comparator  has  an 
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iiitoriKil  ri'iiister  {  i.et  by  P.Mip)  .ind  il  producer-  lv.‘r>  output  rip,n,il'.,  'input  -  rngir.ler’ 
and  'input  r  register’.  A  bit  mask  has  two  infernal  registers  (  set  by  P.c-up)  .  One 
specifics  ttie  care/  don't  care  conditions  (or  each  bit,  ttie  other  specifics  tlic  cxpeclc?d 
bit  patlcrn.  A  single  output  indicaler.  if  llio  ii^piit  satisfii’s  the  specified  patlern.  Signal 
inputs  for  the  comparators  and  the  bit  masks  are  arrangcct  in  (our  groups:  uniljus 
address,  unibus  data,  miscellaneous  probes  and  control.  The  control  group  iiicludes 
unibus  control  signals.  In  the  current  implementation  four  CN'cnts  can  be  defined 
simull  anc'ously. 

In  some  commercial  monitors,  the  plug  boards  arc  rcmos'al)l(^  and  pre-wired  plug 
Ijoards  (or  certain  standard  experiments  are  prc^vided.  ADAM  [IIUGH74]  has  an 
inleri'sting  way  of  event  detection,  it  has  a  monitor  register  v.'hich  holds  all  the 
selected  input  signals.  Ttie  monitor  rergister  is  suitai>ly  masked  anci  then  compared 
f.imultancously  to  sixty  four  bit  patterns  in  a  content  addressable  memory.  Ttie 
matching  p,sttcrn  determines  the  event  that  has  happened. 

The  Waterloo  hardware  monitor  (  RCIIM  [MORG73])  imptrments  a  very  sophisticated 
plug  board  in  which  ;.onie  of  the  connections  can  be  made  under  program  control.  For 
o>:anir.>lR,  it  contains  a  programmable  16  X  4  sv/itch  matrix  whicli  allows  up  to  4  of  its 
16  input  signals  to  be  availalile  as  its  output.  It  also  lias  programmalile  combinatorial 
logic  units  which  accept  S  input  signals  and  yield  one  output  signal  which  is  any  logical 
function  of  the  inputs.  There  is  a  hardware  unit  to  delect  (lie  sequences  of  events.  In 
addition  to  the  above,  il  has  compiirators,  interval  timers,  event/lime  counters  and 
character  detectors  to  aid  event  detection.  Even  though  the  hardw.ire  components  arc 
programmable,  Ihc  inputs  to  those  arc  usually  from  the  plug  board  and  the'  outputs  are 
also  usually  available  only  on  the  plug  board.  This  arrangement  allows  small  changes  in 
the  experiments  to  be  made  under  program  control  without  manual  intervention. 
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A, 2. 2  Event  rcsponijc  spncificaiion 

When  nn  event  is  fietcctcd,  it  is  somelimes  suiticient  (o  jiisl  record  (lio  fact  v/licreas 
•nt  other  limes,  it  is  ncicescary  to  record  morn  information  like  the  address  or  data  that 
caused  the  cvc;nf  to  tal'.e  place,  A  time  stamp  and  the  value's  of  infernal  counter:;  may 
also  be  required  for  later  analysis.  In  the  early  monitors,  only  summary  type 
inforiiuation  was  made  a'/ail.ible.  So  the  only  re'-ponse  to  any  event  was  to  increment 
an  inlornal  counter.  Tliis  is  sufficient  if  only  ';rosG  aveiaoe  values  of  the  mrasured 
quantities  are  required.  If  hC)wevc:r,  one  needs  to  generate  lustogi  am;-,  for  construe  tir>g 
analylical  or  simulation  models,  more  detailed  information  has  to  be  obtained  by  the 
hardv.'aro  monitor.  In  K.mon,  v/hen  an  event  is  detected,  up  to  9  words  of  inforrn.ilion 
can  be  obtained.  These  are:  address,  data,  probes,  miscellaneous  signals,  clock  value 
and  four  words  giving  the  number  of  times  each  of  the  lour  cjvc-Mit.s  has  occured  so  far. 
Moreover,  two  internal  flags  can  be  set  or  reset  to  facilitate  detecting  a  ‘.cquencc  of 
events.  K.mon  thus  acts  like  a  filter  which  detects  certain  interesting  cycles  out  of  a 
vast  number  of  uniljus  cycles  and  then  makes  selected  inputs  available.  This 
arrangement  is  necessary  for  experiments  involving  tracing  of  events.  In  a  trace  mode 
the  input  informalion  along  wifli  the  time  stamp  and  oilier  information  internal  to  the 
tnonilor  iu  If ansferL'd  directly  to  the  oiilpiit  storage  medium. 

Tlie  Waterloo  monitor  has  a  hardware  time  stamp  register  wliich  can  record  12  bits 
of  'environment’  informalion  plus  2^  hil  time  when  any  of  the  8  selected  events 
happen.  In  addition  if  has  a  seqcjential  event  detector  which  can  he  programmed  to 
recopni^fc  a  sequence  of  events  given  as  a  regular  expression. 

The  losdata  MS  monitor  employs  a  'mapping’  scheme  of  counters  wliich  uses  4 
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coiinlcrr.  lo  count  liter  occuri irnccr-  of  fho  posoiltle  combinnlionr.  ol  two  iitpul  oiQn.iln 
.Tiitoin.-jtically,  Thir.  oclicmp  is  used  lo  driornrinp  Iho  ovcrl.tp  ItctwFf.’n  CI’U  iinci  a 
chaniiol.  The  two  iitput  ;>ip,nalti  are  'CPU  busy’  and  'Chnnnpl  Itusy'  sampled  at  a  certain 
rate.  The  d  counters  then  count  the  occurrences  of  both  CPU  and  cltaimel  idle,  only 
CPU  l.nisy,  only  channe:!  busy  and  both  CPU  ancl  channel  busy. 

11  has'bcren  recogni.Tod  that  quite  a  few  of  the  cxperimcntr.  fall  into  the*  category  of 
gencM  ating  histograms,  so  some  of  the  new  monitors  arc  capable  of  generating 
histograms  in  hardware.  This  reduces  the  amount  of  data  going  to  the  P.sup  thereby 
rcniovirig  the  cause  of  a  major  bottleneck.  The  parameters  of  the  histogram  like  the 
upper  and  lower  bounds  are  generally  jtrogrammablc  from  ibe  P.sup.  Anolber  feature 
that  can  be  quite  useful  in  certain  experiments  is  a  dynamically  read/wrilable  internal 
register.  Such  a  rirgisler  allows  dynamic  allnration  ol  an  on-going  expcrimenf  in 
rcjsponse  fo  incoming  data.  As  a  response  fo  an  event,  this  register  can  be  loaded 
with  fhe  current  address  or  cfata  vatues  or  the  clock  value.  Tfiis  register  can 
snbsc  quontly  l)e  used  in  event  d(‘tPction  ancl  it  can  also  be  output  wlien  sonie  oilier 
event  is  deteefeef.  This  register  can  be  used  for  tracking  relocatable  pieces  of  rode 
and  other  measurements.  If  it  is  possible  to  store  time  differences  in  this  register,  tlicn 
determining  the  maximum  time  difference  betw'cen  Iwo  events  becomes  easy. 
Hughc;s[HUGU7d]  used  this  technique  to  determine  the  maximum  duration  of  the 
interrupt  disabled  mode  in  a  system. 

A.3.3  Display  of  Information 

This  is  another  area  where  a  lot  of  progress  has  been  made  since  the  early 
monitors.  Tor  a  summary  type  monitor,  all  that  is  really  necessary  is  lo  display  the 
contents  of  the  counfers,  preferably  in  decim.al.  For  second  generation  monitors. 
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howcv(:r,  it  in  convenient  to  tiavc>  a  ta|>{*  or  ciint'.  r.torac.c  tiiiit  and  /  Or  a  '.upci  visory 
prncf'.'.or  (or  on  lino  clii'.pilay,  l.'arly  corameri iai  monilcir!.  iMcd  (ape;  nlorapt'  and  pO'"t- 
proc  c  nning  to  5jc;n(;ratc;  r(!por(<j.  Tlie  dinaclvantaoe  in  tdaf  o.pfi  iinent  r.c(  up  eri  orr.  arn 
not  doteclod  until  aftur  the  post-procesr.in;;  i‘j  done.  Mod  modejrn  iviOtritorr.  ate 
capal'le  of  ncncralinR  ;;omG  real  time  display  wiiile  also  storiiif;  data  (or  por.t- 
procc  ssinjT. 

Sonic  standard  output  (orniats  have  evolved  over  lirne.  The  trend  has  been  to 
provide  ((ata  in  a  pictorial  format  to  make  it  easy  lo  coiriprrhend.  ITir.tof.rains  are  a 
good  exampile  of  providitig  a  visual  representation  o(  a  distribution  fvinclion,  Gantt 
profiles  have  been  used  to  suniniari;re  resource  ulili:?ation  and  overlap  (  see  figuie 
A.3).  Another  i  opi  osentation  of  icsourrc  ulili;?aiinn  is  prnviu'ed  with  a  ’ICivial  graph',  II 
is  obtained  by  plotlinQ  equal  number  of  'good'  and  'bad'  indicators  of  performance  on 
alternate  axes  in  a  circle  (  see  figure  A.^).  Tlie  Kiviat  graph  has  a  lot  of  visual  appeal 
because  alt  the  utiliijation  data  is  available  at  a  glance  and  the  shape  of  such  a  graph  is 
an  indication  of  the  'goodness’  of  a  computer  system. 

Since  quite  a  few  experiments  use  a  histogram  to  display  information,  some 
commercial  monitors  are  equipped  with  special  hardware  lo  display  histograms  directly. 
On  the  other  hand,  some  monitors  like  the  Remote  Controlled  Hardware  Monitor  at 
Waterloo[K^RG73]  transfer  information  to  a  central  computer  ovc^r  phone  lines  for 
post- proc cssing.  fn  come  ofher  experiments,  information  is  gafhereef  in  a  data  base  for 


long  term  I  rend  analysis. 


i34 


A. 3.  Comparison  of  come  Harclvi'aie  Moniiors 

A  few  of  the  oIrJ  and  new  monitors  are  (omparat)  Ix'low.  A  few  commetrial  monitors 
(  e.g.  SUM)  are  no  longer  available  and  are  therefore  not  dir,cus.nGd.  Detailed 
i(\formatior»  <ould  not  be  obtained  regarding  the  Univac  llOS  monitor  and  a  cornmerriat 
monitor  sopolied  by  Computer  Performance  Instrumentation,  trie.  Ttie  monitors 
discussed  ar(^  listed  l)elow  in  approximate  clironological  order; 

1.  7000  portable  monitor  l!BK<fi3] 

2.  Ktomory  bus  monitor  [rRYll73] 

3.  f'kiuiotron  Monitor  [ASCU71] 

ADAM  [HUGH7A] 

5.  K.nton  [fUI.L73] 

6.  WCMM  [K^JRG73] 

7.  Dyhaprobe  SO) 6  {l)YNA}’6] 

8.  Tesdata  MS  {TESD76] 
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Figure  A. 5  A  Survey  of  Current  Monitors 
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A.^.  Trends  in  the  Hardware  Monitor  Development 

The  advoinf<(ger.  of  providing  programmable  registers  in  (he  hardware  monitor  over 
the  earlier  matiual  plug  boards  are  obvious.  With  prograinmable  registers,  experiments 
can  be  modified  quickly  if  in  error  or  if  imoruing  data  so  requires.  In  fad,  going  one 
stop  further,  the  hardware  monilor  or  the  host  machine  tan  bo  allowed  to  set  tertaii> 
i»iternal  registers  (  e.g.  (ompnrators  or  hold  registers).  This  ff-ature  can  be  used  to 
ke  ep  track  of  a  program  which  can  dynamically  tnove  in  the  maii^  memory  of  the  host. 

Early  iii  the  development  of  hardware  monifors  it  svas  rec)li::ed  that  it  is  much  more 
convenient  to  use  a  minicomputer  to  perform  some  of  tlio  logical  and  arithmetic 
functions  rather  than  designing  the  hardware  to  do  them.  Moreover,  the  minicomputer 
can  be  used  to  store  the  data  on  secondary  storage  as  well  as  communicating  with  the 
user.  Depending  on  the  rale  of  data  gathering,  the  minicomputer  can  perform  some 
cornpaction  and  then  display  or  print  preliminary  results  while  Ihe  experiment  is  still 
going  on. 

If  one  is  monitoring  a  processor  on  a  chip,  the  input  to  the  monitor  is  severly  limited 
to  the  bus  coming  out  of  the  processor.  None  of  the  internal  status  bits  are  accessible. 
In  such  cases,  a  self  monitoring  feature  can  be  provided  for  microprogrammed 
processors.  The  microcode  can  have  hooks  at  interesting  places  to  enable  one  to  insert 
mcast.irnmcnt  microcode.  Such  a  monitoring  system  has  all  Ihe  hardware  data  in  Ihe 
processor  available  and  by  careful  overlapping  of  operations  it  can  be  made  to  cause 
less  perturbation  than  a  pure  software  monitor.  This  might  be  a  way  to  measure 
microprogrammed  processors  on  a  chip.  The  disadvantage  is  that  the  status  bits  in  the 
peripheral  devices  arc  not  directly  accessible. 
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TIinru  in  <»  (lintincl  trnnci  in  providing  some  proccsning  inniiJo  the  hardwaro  monitor. 
The  examples  are  histogram  generation  or  moment  calrnlation.  As  more  common 
processing  requirements  among  different  experiments  are  iefentifieef,  it  will  be 
justifiable  to  put  thorn  directly  in  iiardware.  Gimilarly,  as  common  display  formats  are 
disco'/ored,  lliey  will  be  put  in  hardware  or  will  be  supplied  as  standard  set  of 
programs.  There  has  been  some  progress  in  defining  a  measurement  langcrage.  Sirch  a 
language  v/iil  allow  the  users  to  specify  fhc  events  to  be  delccled,  the  information  to 
be  gathered  when  an  event  occurs  and  finally  fhc  formal  in  which  the  information  is  to 


be  displayed. 
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Appendix  B:  The  Instruction  Mix  Experiment 

Tins  iippondix  pm'-enls  the  output  of  the  instruction  mix  experiment  in  d<!tr^il.  Since 
cliapter  <1  identities  'MOV’  as  the  most  frequently  t:xeculi'd  instruction,  v.p  piesent  the 
fractions  for  the  execution  of  the  'MOV  instruction  in  cadi  of  the  segments.  Similar 
information  ran  be  obtained  for  any  other  instruction  or  addressing  mode  but  we  will 
not  report  it  here  (or  brevity.  It  is  interesting  to  observe  thc^  large  variation  in  the 
fractions  for  the  MOV  instruction.  The  fractions  range  from  more  than  0.5  in  program  1 
in  area  5  to  0.0001  in  program  4  in  area  5.  In  fact,  it  can  bo  seen  (hat  program  4  in 
area  5  consistently  uses  fewer  Ky)V  insiructions  than  any  of  the  other  programs.  We 
traced  the  reason  for  tliis  behavior  to  the  fact  that  it  is  a  hand  -written  program  witich 
spends  most  of  its  time-  in  a  small  light  containing  very  few  MOV  instructions. 

In  chapter  4  wg  reported  that  the  variance  between  segments  is  small  compared  to 
(he  clher  (wo  sources  of  variance.  The  cfata  presented  here  clearly  brings  out  this 
fact  for  the  MOV  instruction. 
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At’I’L  ICA  I  ION  AR[\A  2;  Bu^.inor.r.  Cobol  Bimclimarks 


1 

PrO[;ram  « 

2 

3 

4 

5 

1 

.199 

.306 

.249 

.247 

.247 

2 

.169 

.312 

.256 

.253 

.258 

3 

.193 

.302 

.256 

.223 

.223 

4 

.152 

.298 

.275 

.271 

.271 

5 

.195 

.302 

.288 

.219 

.219 

6 

.200 

.303 

.247 

.284 

.284 

7 

.203 

.298 

.231 

.248 

.248 

S 

.250 

.304 

.189 

.215 

.215 

9 

.188 

.303 

.210 

.275 

.275 

10 

.221 

.306 

.239 

.270 

.270 

11 

.185 

.332 

.226 

.196 

.196 

12 

.188 

.323 

.250 

.272 

.272 

13 

.161 

.248 

.211 

.289 

.289 

14 

.194 

.206 

.229 

.197 

.197 

15 

.201 

.216 

.223 

.249 

.249 

16 

.194 

.290 

.212 

.272 

.272 

17 

.242 

.265 

.203 

.233 

.233 

18 

.214 

.215 

.225 

.276 

.276 

19 

.193 

.252 

.249 

.288 

.288 

20 

.211 

.288 

.253 

.289 

.289 

21 

.181 

.274 

.241 

.294 

.294 

22 

.190 

.321 

.249 

.284 

.284 

23 

.166 

.305 

.253 

.289 

.289 

24 

.193 

.304 

.259 

.294 

.294 

APPLICATION  AREA  3: 


I'll 


Opfinting  Systems 


1 

Proi’f  Hm  «« 

2 

3 

4 

5 

:iirnont  « 

1 

.325 

,313 

.309 

.306 

.250 

2 

.3'10 

.294 

.324 

.309 

.256 

3 

.240 

.278 

.314 

.307 

.254 

4 

.295 

.291 

.323 

.304 

.258 

5 

.408 

.329 

.313 

,309 

.253 

6 

.306 

.306 

.313 

.303 

.249 

7 

.237 

.305 

.326 

.313 

.253 

3 

.381 

.313 

.321 

.294 

.256 

.  9 

,336 

.302 

.332 

.294 

.249 

10 

.226 

.336 

.315 

.317 

.256 

1 1 

.308 

.312 

.319 

.311 

.254 

12 

.323 

.290 

.319 

.323 

.251 

13 

.337 

.313 

.329 

.296 

.255 

14 

.243 

.281 

.317 

.308 

.267 

15 

.330 

.307 

.332 

.307 

.259 

16 

.406 

.235 

.336 

.315 

.253 

17 

.296 

.311 

.319 

.323 

.259 

18 

.249 

.298 

.306 

.312 

.254 

19 

.401 

.319 

.323 

.305 

.252 

20 

.335 

.322 

.312 

.222 

.252 

21 

.239 

.306 

.333 

.308 

.262 

22 

.343 

.292 

.323 

.293 

.257 

23 

.400 

.313 

.315 

.294 

.255 

24 

.311 

.312 

.323 

.322 

.236 

APPLICATION  AREA  4: 
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Syr.temr.  Pro(;i  amr. 


1 

PfORrHm  « 

2 

3 

4 

5 

■t;tneiit  it 

1 

.281 

.339 

.323 

.163 

.383 

2 

.288 

.338 

.320 

.197 

.407 

3 

.273 

.343 

.318 

.208 

.398 

4 

.277  . 

.343 

.321 

.231 

.407 

5 

.281 

.350 

.318 

.229 

.405 

6 

.282 

.331 

.241 

.187 

.399 

7 

.284 

.351 

.205 

.165 

.400 

8 

.285 

.361 

.185 

.160 

.403 

9 

.279 

.347 

.131 

.182 

.390 

lO 

.286 

.347 

.135 

.172 

.395 

11 

.282 

.344 

.126 

.202 

.405 

12 

.289 

.350 

.321 

.168 

.412 

13 

.289 

.352 

.321 

.175 

.408 

14 

.277 

.348 

.320 

.195 

.406 

15 

.277 

.350 

.321 

.210 

.405 

16 

.275 

.341 

.322 

.187 

.412 

17 

.284 

.341 

.323 

.184 

.405 

18 

.282 

.346 

.313 

.185 

.394 

19 

.281 

.344 

.147 

.194 

.389 

20 

.279 

.340  . 

.206 

.190 

.381 

21 

.279 

.343 

.122 

.183 

.379 

22 

.284 

.339 

.138 

.179 

.386 

23 

.276 

.330 

.133 

.187 

.375 

24 

.277 

.349 

.271 

.217 

.406 

APF^LICATION  area  5: 


R«:al  Time  Pto>;rams 


Program  t» 

1 

2 

3 

4 

5 

^jrnent  *» 

] 

.470 

.453 

.354 

.108 

.403 

2 

.430 

.450 

.347 

.166 

.406 

3 

.462 

.449 

.361 

.107 

.406 

a 

.465 

.454 

.348 

.140^-2 

.398 

5 

.460  ' 

.448 

.350 

.450CT-3 

.403 

6 

.474 

.447 

.391 

.250^-3 

.392 

7 

.475 

.447 

.358 

.000 

.388 

S 

.474 

.447 

.338 

.100?P-3 

.383 

9 

.465 

.452 

.339 

.500W-3 

.386 

]0 

.460 

.453 

.345 

.239fi'-3 

.385 

11 

.472 

.448 

.322 

.3S4;H'-1 

.375 

12 

.477 

.457 

.307 

.416^-1 

.382 

13 

.459 

.452 

.345 

.432fB-l 

.397 

14 

.490 

.446 

.385 

.126 

.402 

15 

.493 

.448 

.341 

.129 

.408 

16 

.437 

.448 

.363 

.149 

.399 

17 

.493 

.450 

.379 

.129 

.412 

IS 

.433 

.445 

.357 

.110 

.410 

19 

.501 

.449 

.387 

.138 

.410 

20 

.499 

.459 

.356 

.149 

.405 

21 

.514 

.454 

.385 

.140 

.409 

22 

.512 

.447 

.361 

.124 

.401 

23 

.437 

.453 

.358 

.458!®- 1 

.395 

24 

.471 

.453 

.359 

.137iS>-l 

.368 
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Appendix  C:  Comprehensive  Unibus  Cycle  Trace 

In  this  appendix  we  present  a  santple  oiilput  from  our  analysis  program  which 
analyses  the  Unibiis  cycle  trace  as  discussed  in  section  6.1.4.  Because  of  the 
bandwidth  limitations  of  K.mon,  it  is  difficult  to  collect  a  trace  consisting  of  a  large 
number  of  cycles.  In  this  example,  we  were  able  to  collect  only  about  20500  cycles 
during  30  minutes.  It  should  be  noted  that  the  data  presented  here  has  been  obtained 
for  a  single  program  and  wc?  cannot  drav/  any  general  conclusions  from  this  data.  In 
fact,  this  particular  trace  shows  that  only  about  IG  percent  of  the  instructions  were 
MOV’s  whereas  in  chapter  4  we  saw  lhal  the  MOV  instruction  is  executed  on  the 
a-'erage  about  31  percent  of  the  time. 

Once  a  Unibus  cycle  trace  is  collected  for  a  large  number  of  programs,  it  will  be 


possible  to  answer  a  variety  of  important  design  questions.  The  analysis  presented 
hero  is  mtendod  to  give  the  reader  an  idea  of  the  kind  of  analysis  that  can  be  done 
with  such  a  complete  cycle  trace. 


N^odo  register  slalisfics  for  single  operand  instructions 


mode 

Register  number 

number 

0 

1 

2 

3 

4 

5 

6 

7 

0 

1242 

1007 

263 

317 

630 

652 

0 

0 

1 

33 

2 

0 

7 

0 

3 

65 

0 

2 

0 

0 

0 

0 

0 

0 

70 

0 

3 

0 

0 

0 

0 

0 

0 

88 

24 

4 

0 

0 

0 

0 

0 

0 

103 

0 

5 

0 

0 

0 

0 

0 

0 

0 

0 

6 

10 

20 

9 

120 

0 

3 

66 

560 

7 

0 

0 

1 

0 

0 

0 

75 

0 

Source  mode  and 

register 

statistics  for  double  operand  instructions 

mode 

Register  number 

number 

0 

1 

2 

3 

4 

5 

6 

7 

0 

280 

175 

581 

294 

107 

94 

29 

0 

1 

2 

30 

27 

19 

57 

43 

6 

7 

2 

0 

0 

0 

0 

0 

3 

423 

404 

3 

0 

0 

0 

0 

0 

3 

0 

64 

4 

0 

0 

0 

0 

0 

0 

3 

0 

5 

0 

0 

0 

0 

0 

0 

0 

0 

6 

16 

52 

10 

0 

1 

5 

372 

363 

7 

0 

0 

0 

0 

0 

0 

21 

4 

Destination  mode  and  register  statistics  for  double  operand  instructions 


mode 

Register  number 

number 

0 

1 

2 

3 

4 

5 

6 

7 

0 

648 

515 

289 

170 

222 

116 

16 

1 

1 

9 

16 

69 

57 

42 

67 

132 

0 

2 

0 

0 

0 

0 

0 

0 

10 

201 

3 

0 

0 

0 

0 

0 

0 

0 

49 

4 

6 

0 

0 

0 

0 

0 

454 

0 

5 

0 

1 

0 

0 

0 

0 

0 

0 

6 

13 

2 

7 

I 

0 

1 

144 
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7 
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0 

0 

1 

0 

6 

0 
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Histogram  of  number  of  Uf\ib(/s  eyrie?  per  inr.truclion 


«  OF  SAMPLES  11854  MEAN  J.672  STD  DFV.  J.038 

MINIMUM  1  MAXIMUM  8  Total  198 18 

RANGE  COUNT 
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Hiotogram  of  Ihe  index  values  off  slack  poinfor: 

This  information  can  be  used  to  decide  Ihe  utility  of  providing  a  small  field  in  every 
instruction  to  specify  the  parameter  number  off  the  stark  pointer. 
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I  liotogram  of  the  immediate  mode  operands: 

This  information  can  be  used  to  decide  the  utility  of  providing  a  small  field  in  every 
instruction  to  specify  small  immediate  operands. 
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1 
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0 

(  -1  TO  0] 

0 
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0 

(  13  TO  14] 

4 

(  14  TO  15] 

27 

(  15  TO  16] 

16 

(  16  TO  +inf 

344 
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* 


* 

t*ttt*t**tt****tt*** 


Hir.togram  of  the  index  values  off  registers  other  than  SP 


#  OF  SAMPLES  H37  MEAN -3890.991  STD  DEV.  10350.534 

MINIMUM -32694  MAXIMUM  32722  Total -5591354 

RANGE  COUNT 
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Appendix  D:  The  Execution  Profile  for  Hydra 

The  following  is  the  execution  profile  of  Hydra  measured  while  executing  a  parallel 
root  finding  program  utilizing  kernel  semaphores  for  synchronization.  The  Hydra 
kernel  instructions  were  sampled  at  random  using  the  hardware  monitor.  Tlie  changes 
in  the  overlay  code  page  were  also  monitored  with  the  hardware  monitor. 


Relocation 

function 

Number  of  instructions 

register 

0 

stack  page 

0 

1 

common  data 

82 

2 

overlay  data 

0 

3 

overl.ay  data 

0 

4 

overlay  code 

10S40 

5 

common  code 

3987 

6 

local  memory 

2739 

7 

device  registers 

0 

Total  number  of  instructions  sampled:  17643 

The  above  table  shows  some  instructions  being  executed  out  of  a  data  page  via 
relocation  register  1.  This  is  not  due  to  any  error  in  the  hardware  monitor  but  Hydra 
implementors  found  it  convenient  to  execute  some  instructions  from  a  page  which 
otherwise  contains  only  common  shared  data.  Hydra  consists  of  about  50  pages  of 
instructions  and  data.  By  monitoring  the  changes  in  the  relocation  register  we  were 
able  to  identify  which  of  the  overlay  code  pages  {  from  page  7  through  page  47  in  the 
following  listing)  was  being  accessed  through  that  register.  In  the  following  listing, 
routines  that  do  not  have  any  samples  in  them  are  suppressed.  For  the  common  code 
page  (page  6), however,  all  global  routine  names  are  printed. 

Page  number  6  Total  Instructions:  3987 


address 

routine 

name 

number  of  samples 

120000 

CSUS.C 

0 

120006 

ERRPRO 

0 

120232 

HLNK.C 

0 

120252 

120274 

120354 

120460 

120502 

120532 

120574 

120632 

120674 

120732 

120734 

120736 

120740 

120742 

120744 

120746 

120750 

120754 

120756 

120760 

120764 

120766 

120770 

120772 

121006 

121030 

121062 

121074 

121076 

121124 

121162 

121266 

121330 

121374 

121430 

121460 

121462 

121502 

121740 

121770 

122010 

122046 

122156 

122250 

122410 

122502 

122562 

122646 

122706 

122766 


SLINK 

SEHYl  1 

8EHY01 

8EHYXT 

TTTC.C 

TTWRIT 

oincH 

SIXOUT 

MOVE 

MOVE 16 

MOVE  15 

MOVE 14 

MOVE  13 

MOVE 12 

MOVE 11 

MOVE  10 

M0VE9 

M0VE7 

M0VE6 

M0VE5 

M0VE3 

M0VE2 

MOVEl 

MOVEO 

M0VE4 

MOVES 

PGCM.C 

STCMC 

GETPAG 

RETPAG 

SETHOR 

GTSZl 

GTS72 

GTYPl 

GTYf’2 

10LE.C 

IDLEOI 

IDLE 

IDLE.P 

LVEC.C 

LVEC.P 

FPCM.C 

OBJSHR 

OBJDEL 

COBJDE 

OBJADE 

OBJASH 

FPSEM 

FVSEM 

FCSEM 
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123026 

rCMUT 

42 

123066 

f-f’MUT 

5 

123126 

FVMUT 

72 

1 232^0 

WIITOBJ 

0 

12330/1 

FPCI-IK 

0 

123322 

W)  IT  TOT 

0 

123334 

WIITACT 

0 

123346 

FPOD 

0 

1 23350 

TYPFMA 

2 

123534 

SHPTYP 

0 

123564 

OejTYP 

1 

123626 

FTYPE 

0 

123660 

TYPING 

4 

123722 

TYPDLC 

0 

124014 

ATYPDE 

0 

124106 

WHTYPE 

0 

124156 

TYUS 

0 

124160 

DPAPT2 

253 

124270 

DPAPTl 

101 

124412 

R0PRT2 

0 

124526 

RDPRTl 

2 

124564 

DATA2M 

33 

125020 

DATAIM 

3 

125064 

RDATA2 

1 

125242 

ROATAl 

0 

125306 

X]Tr:M2 

17 

125550 

XITHMl 

4 

125614 

1TCM2M 

5S2 

126000 

ITEMIM 

7 

126044 

W1TEM2 

0 

126224 

WlTEMl 

0 

126270 

DATAI.E 

0 

126370 

ITEMLE 

10 

126556 

MnCM.C 

0 

126560 

FIRSTO 

1 

126630 

DOOIKS 

0 

126712 

□ADMEM 

0 

126724 

ITCM.C 

20 

127032 

TYPOBJ 

0 

127072 

PROCOB 

1 

127132 

LNSOBJ 

0 

127172 

POLICY 

2 

127232 

PRCSC:3 

1 

127272 

PGOBJ 

0 

127332 

SEMOI3J 

80 

127372 

PSCMOO 

0 

127432 

DATAOB 

0 

127472 

PORTOB 

1 

127532 

DEVO0J 

0 

127572 

UNIVOB 

0 

127632 

EQTYPE 

0 

127730 

E0I5F.F 

0 

1300^2 

FITYPE 

0 

130110 

SITYf’E 

0 

130122 

rPPTl 

362 

1302^0 

Fpm2 

1 

130352 

ANOPTS 

0 

130502 

ORRTS 

0 

130562 

K'lOVPTS 

1 

130666 

RTStlST 

0 

130754 

CIII?TSQ 

0 

131046 

CHKRTS 

1 

131136 

PASSIT 

0 

131172 

MAKNIJL 

3 

131212 

SHARIT 

25 

131470 

DLTITM 

14 

131650 

SETMPL 

0 

131736 

MKITfM 

0 

132054 

MAKITK^ 

0 

132136 

OBJC 

0 

132216 

ITCM.P 

0 

132240 

CDIT.C 

0 

132414 

CAl.DEQ 

0 

132442 

CKMP.C 

1 

132462 

TSTl.OC 

0 

132502 

LOCK 

315 

132612 

1PI7 

178 

133004 

BDLOK 

0 

133020 

BDULOK 

0 

133034 

Uf'JLOCK 

3SS 

133106 

PCBCUR 

2 

133136 

PROS  ID 

218 

133342 

INCRIT 

59 

133356 

DICRIT 

146 

133416 

PMUTEX 

6 

133460 

COMDPM 

33 

133552 

VMUTEX 

69 

133642 

P 

39 

133670 

COf'lDP 

32 

133724 

V 

66 

133772 

CKMP.P 

0 

134106 

Dlv<NC.C 

0 

134110 

TGSTDM 

0 

134130 

SIGE.C 

0 

134134 

SSIGNl 

1 

134154 

SENABL 

184 

134216 

K1.61.C 

0 

134234 

S 1X005 

0 

134354 

KM61.C 

0 

134710 

KM61.P 

0 
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KCHK.C  0 
134736  CHKADK  43 
135034  ASA7.C  0 
135224  UAiDl?.C  4 
.135250  CUKUAD  2 
135416  CHKUBY  0 
135546  UGERIM  2 
135600  UGE:R2M  0 
135632  SETUDl  0 
135650  UADP  2 
136324  GETGNA  0 
136414  READGC  0 


Page  tiunibnr  7  Total  Im.lructionc;:  52 

100000  fpoac  5 

100406  GET AFP  1 

101066  OBXRE  22 

102404  OBJPAT  6 

104174  WHATDA  1 

104362  STORDA  1 

105006  COPYDA  1 

1 10456  ACTPP  2 

112220  PASOBJ  13 


Page  number  10  Total  Instructions;  22 

100002  GTPFPA  2 

100720  GETPFP  7 

104344  GETPPR  2 

105016  FREEFP  3 

106566  FREEGS  2 

115024  TRUENU  6 


Page  number  11  Total  Instructions:  23 

107754  KMIT.C  5 

1 10524  STORPC  I 

111312  POLRCV  4 

112106  SETPOL  1 

112772  PORVLN  10 

1 13742  STOPCU  1 

115202  PROS 1C  1 


Page  number  12  Total  Instructions:  392 
101126  CHKLST  305 

102420  NGETCO  72 

103676  NRETCO  15 
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Page  number  13  Total  InsIructionE:  5 

11 0264  HASH  2 

110A52  HTLINIC  1 

115414  INCFPA  1 

116106  SUBDPS  1 

Pafio  nutnbor  22  Total  In;;lruclionr.:  309 

100000  I0iN.C  293 

103366  TRCKWP  16 

Page  number  23  Total  Instructions;  781 

100000  MKRN.C  722 

104504  POLCY  1 

105502  RCVPOL  2 

107270  RSTRFG  2 

112776  SKRN.C  54 

Page  number  24  Total  Instructions:  841 

100002  NXTLOC  2 

101406  COWPAR  1 

102234  GETDAT  1 

102524  PUTDAT  1 

105452  STORE  2 

106062  i.0A0  9 

107530  DELETE  7 

111220  CWLK.C  199 

112116  V4WALK  291 

117020  PSEM  215 

117400  VSEM  113 

Page  number  25  Total  instructions:  35 

103452  START  2 

105144  PSHCON  1 

107356  LNS.C  1 

107636  MERGE  2 

111574  SETUPL  22 

113162  DOCALL  1 

114246  KRETUR  3 

1 15236  TCALL  3 

Page  number  26  Total  Instructions:  93 

100000  lOTK.C  3 

100042  MAPHT  2 

100072  UNI.^APH  6 

100742  GETS  1 


10/1036  ENQt3EF  3 

10/1172  DEQ  Ifl 

10/1320  OAPPEN  7 

10/1/1/12  COAPPE  2 

104736  QREMOV  14 

107104  lOGENf)  1 

110126  0010  9 

110426  RGETOP  2 

110634  F?GE7f3U  17 

111650  RGETIN  4 

116304  PRPRFl  8 

Page  numbor  30  Total  Instructions:  732 

100000  lORP.C  337 

104220  PRPRPl  2 

104710  UPRPll  95 

105336  lORP.P  298 

Page  number  31  Total  Instructions:  343 

100000  FENO.O  1 

100034  FEERCT  66 

102174  FENllC  54 

103426  FEOIR  86 

104320  FEDISC  6 

104546  FED0V»'N  2 

105122  FEUP  10 

105506  ASAl.C  1 

106110  PRASA  1 

107630  UPAS A  1 

107766  ASAIO  13 

112276  INTWIM  1 

112416  STIMP  70 

1 13342  PRPIMP  2 

113700  UPIMP  2 

114272  GENIPI  26 

114700  FASTSE  1 

Page  number  34  Total  Instructions;  114 

100152  MSRV.C  46 

104320  PACC.C  20 

112076  MCREAT  1 

1 1 2374  MREA.C  20 

112674  MREAD  1 

113172  MWRI.C  16 

113472  MWRITE  1 

113772  MSND.C  8 

115400  MRS VP  1 
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P}»Be  tuimbor  35 
102366 
103100 
100276 
105730 
112310 


Page  number  04 
100000 
100034 
100144 
100542 
100726 
101152 
101306 
101376 
101446 
101470 
101524 
102234 
102642 
103174 
103550 
103662 
104024 
105244 
105414 
106564 
106722 
107254 
110150 
110504 
111460 
112714 
113626 
114136 
115206 
116270 
116362 
116454 
116714 
116762 
117034 


Total  Inslruclion 

s: 

MRPY.C 

1 

MPEPLY 

1 

MRCV.C 

7 

MWAIT 

3 

VPOLSE 

1 

Total  Inr.1  ructions; 

KMPS.C 

149 

DELINK 

257 

ENO 

1021 

FNDI^RC 

144 

REQPRO 

981 

HlGHFl 

337 

GETSPA 

1 

FREESP 

2 

PRIWIN 

74 

ADDTIM 

76 

SIJDTIM 

81 

INIWAT 

700 

SWCXT 

249 

SWAPTO 

340 

SELCTE 

1009 

SELCTS 

275 

IPSCHE 

271 

RETHIN 

121 

INIT5E 

623 

SEND 

1 

RECEIV 

3 

CLOCK 

61 

KMPA.C 

1 

SENDST 

3 

STARTP 

20 

KSTOPC 

1 

RESCUE 

1 

TIMSCH 

1 

DELPRE 

6 

PRSTRT 

2 

PROELP 

2 

PRSEM 

120 

PRPX 

58 

PRVX 

38 

PRCPX 

55 

13 


7084 


Page  number  45 
140000 


Total  Instructions: 
TPPS.p  6 


6 
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Pajje  fuimbcr 

1  /ISOOO 
1/15222 
145264 
145300 
145352 
145444 


Page  number  47 
153000 
153176 
154160 
154550 
154564 
154602 
154622 
154770 
155462 


Total  Instructions; 

LSUS.C 

154 

MCLOCK 

5 

IPCLOC 

7 

MSCHilD 

14 

IPI4 

32 

MIOT 

55 

Total  Instructions: 

1.M10.C 

116 

DRTI.C 

5 

MULD.C 

18 

SAVR.C 

548 

SSAV3 

273 

SSAV4 

191 

8SAV5 

1156 

SIX12 

78 

XSIX12 

31 

267 


2466 


Routine  names  in  the  order  of  decreasing  samples: 

This  list  is  not  normali;!ed  according  to  size  of  the  routines  so  larger  routines  may 
contain  more  samples  even  though  they  are  not  executed  very  often.  Only  routines 


with  more  than  10  instruction  samples  are  listed.  Information  presented  here  can  be 
used  to  decide  which  routines  should  be  moved  to  the  common  code  page  {  accesses 


via  RR5)  and  which  can  be  put  in  the  overlay  pages  without  excessive  cost. 


Routines  in  the  common  code  page  arc  marked  with  * 
Routines  in  the  local  memory  are  marked  with  L 


Page  num 

Address  Routine 

(  size  INumber  of 

samples 

47 

154622  L  SSAV5 

(  28) 

1156 

44 

100144  ENQ 

(  254) 

1021 

44 

103550  SELCTE 

(  74) 

1009 

44 

100726  REQPRO 

(  148) 

981 

23 

100000  MKRN.C 

(2132) 

722 

44 

102234  INIWAT 

<  262) 

700 

44 

105414  INIT5E 

(  450) 

623 

6 

125614  *  lTr:M2M 

{  116) 

582 

47 

154550  L  SAVR.C 

(  12) 

548 

6 

133034  ♦  UNLOCK 

{  42) 

388 

6 

120252  *  SLINK 

<  18) 

363 

160 


6 

130122  *  I  PRTl 

< 

78) 

362 

103174  SWAPTO 

( 

170) 

340 

101152  MKIWI 

( 

46) 

337 

30 

100000  I0f?P.C 

( 2006) 

337 

6 

132502  *  LOCK 

( 

72) 

315 

12 

101126  CHKLST 

( 

698) 

305 

30 

105336  JORP.P 

( 5409) 

298 

22 

100000  10TN.C 

< 

482) 

293 

24 

112116  V4WALK 

( 

120) 

291 

44 

103662  SELCTS 

( 

98) 

275 

47 

154564  L  8SAV3 

( 

14) 

273 

44 

104024  IPSCHII 

( 

572) 

271 

44 

100034  DELINK 

( 

30) 

257 

6 

124160  c  DPART2 

( 

72) 

253 

44 

102642  SWCXT 

( 

218) 

249 

6 

133136  *  Pf?CSIO 

( 

132) 

218 

24 

117020  PSEM 

( 

154.) 

215 

24 

111220  CWLK.C 

( 

446) 

199 

47 

154602  L  SSAV4 

( 

16) 

191 

6 

134154  »  SENABL 

( 

34) 

184 

6 

132612  ♦  IP17 

( 

122) 

178 

46 

145000  L  LSU5.C 

( 

146) 

154 

44 

100000  KMPS.C 

< 

28) 

149 

6 

133356  *  ONCRIT 

( 

32) 

146 

44 

100542  FNDPRC 

( 

116) 

144 

44 

105244  RETHIN 

( 

104) 

121 

44 

116454  PRSEM 

( 

160) 

120 

47 

153000  L  LM10.C 

< 

126) 

116 

24 

117400  VSEM 

( 

88) 

113 

6 

124270  *  DPARTl 

{ 

82) 

101 

30 

104710  UPRPll 

{ 

278) 

95 

31 

103426  FEOjR 

( 

442) 

86 

47 

155462  L  XSIX12 

{ 

86) 

81 

44 

101524  SUBTIM 

{ 

328) 

81 

6 

127332  ♦  SEMOBJ 

( 

32) 

80 

47 

154770  L  SIX  12 

{ 

264) 

78 

44 

101470  ADDTIM 

( 

28) 

76 

44 

101446  PRIWIN 

( 

18) 

74 

12 

102420  NGETCO 

( 

686) 

72 

6 

123126  *  FVMUT 

( 

74) 

72 

31 

112416  STiMP 

( 

468) 

70 

6 

133552  »  VMUTCX 

( 

56) 

69 

31 

100034  FEERCT 

(  1120) 

66 

6 

133724  ♦  V 

( 

38) 

66 

6 

122250  *  OBJDEL 

( 

96) 

65 

44 

107254  CLOCK 

( 

332) 

61 

6 

122156  •  OBJGHR 

( 

58) 

60 

6 

133342  *  INCRIT 

( 

12) 

59 

44 

116714  PRPX 

( 

38) 

58 

46 

145444  L  MIOT 

( 

60) 

55 

KSl 


/?4 

117034  PRCPX 

(  42) 

55 

31 

102174  FtNaC 

(  666) 

54 

23 

112776  SKRN.C 

(  78) 

54 

34 

100152  M5RV.C 

(  1994) 

46 

6 

134736  *  CMKADR 

(  62) 

43 

6 

123026  *  FCMUT 

(  32) 

42 

6 

133642  *  P 

(  22) 

39 

44 

116762  PRVX 

(  42) 

38 

6 

133460  *  CONOPM 

(  58) 

33 

6 

124564  t  DATA2M 

(  156) 

33 

46 

145352  L  IPI4 

(  58) 

32 

6 

133670  *  COf'JDP 

(  28) 

32 

6 

121030  *  M0VH3 

(  26) 

27 

31 

1 1 4272  GrNIPI 

<  208) 

26 

6 

131212  »  SKARIT 

(  174) 

25 

25 

111574  StTUPL 

(  758) 

22 

7 

101066  OBJCRE 

(  614) 

22 

44 

111460  STARTP 

(  668) 

20 

34 

112374  MREA.C 

(  192) 

20 

34 

104320  PACC.C 

(  266) 

20 

6 

126724  »  1TCM.C 

(  70) 

20 

47 

154160  L  MULllC 

(  164) 

18 

26 

110634  RGETI3U 

(  524) 

17 

6 

125306  *  XlTnM2 

(  162) 

17 

34 

113172  MWRI.C 

{  192) 

16 

22 

103366  TRCKMP 

(  230) 

16 

12 

103676  NRETCO 

(  716) 

15 

46 

145300  L  MSCI  IED 

(  42) 

14 

26 

104736  QREMOV 

(  82) 

14 

26 

104172  DEQ 

{  86) 

14 

6 

131470  *  DLTITM 

(  112) 

14 

31 

107766  ASAIO 

(  1224) 

13 

7 

112220  PASOBJ 

(  1516) 

13 

31 

105122  FEUP 

(  244) 

10 

11 

112772  PORVLN 

(  182) 

10 

6 

126370  *  ITEMLE 

(  118) 

10 

Appendix  E:  RSXll-M  Major  Processing  Functions  Trace 


In  this  appendix  we  present  a  short  sample  from  the  trace  obtained  for  our  study  of 
RSXi  1-M.  It  is  intended  to  give  the  reader  an  idea  of  how  a  hardware  monitor  can  be 
used  to  cathcr  a  comprehensive  trace  of  the  activities  liappening  inside  an  operating 
system.  It  should  be  noted  that  this  trace  was  obtained  while  executing  a  simple 
command  typed  at  a  terminal.  It  is  very  interesting  to  find  that  such  a  lot  of  activity 
goes  on  in  the  operating  system  even  to  process  a  simple  request  from  a  terminal. 


Event  trace  for  RSXllM: 


Time  since 

Processor  cycles 

Description  of  Ihe  event 

the  beginning 

since 

(microsec) 

the  beginning 

4038913 

121926  • 

Begin  TTY  input  interrupt 

4089247 

122148 

End  interrupt  service 

4089b3S 

122181 

Begin  TIY  output  interrupt 

4089765 

122332 

End  interrupt  service 

4126177 

123135 

Bogin  TTY  output  interrupt 

4126341 

123246 

Begin  Stork 

4126371 

123269 

End  interrupt  service 

4126457 

123327 

Begin  TTY  interrupt  fork 

4127068 

123745 

Begin  TTY  driver  output 

4127107 

123769 

End  TTY  output  initiation 

4127153 

123800 

Honor  reschedule  request 

4127192 

123825 

Begin  context  switch 

4127272 

123881 

»»»  Context  switch  to:  LOADER 

4127534 

124058 

End  context  switch 

4127892 

124303 

Begin  EMT  processing 

4128133 

124464 

S8888S  EMT  code:  1  QUEUE  I/O 

4128208 

124513 

Begin  QIO  processing 

4128940 

124989 

Begin  RP04  driver  initiation 

4129792 

125551 

«««*««  Read  on  unit  0,  word  count  ■  2048, 
cylinder  movement  of  3 

4129797 

125555 

End  RP04  driver  processing 

4129919 

125635 

Begin  EMT  processing 

4130149 

125789 

SSSSSS  EMT  code:  41  Wait  for  single  event  flag 

4130371 

125925 

Honor  reschedule  request 

4130623 

126021 

Begin  context  switch 

4130768 

126077 

»»»  Context  switch  to;  LV4 

4131144 

126254 

End  context  switch 

4136291 

126319 

Begin  RP04  interrupt  processing 

4136326 

126345 

Begin  Sfork 

4136357 

126368 

End  internipf  service 

4136443 

126426 

Begin  RP04  fork  processing 

4136829 

126679 

RP04  request  task  reschedule 

4136837 

126685 

Begin  RP04  driver  initiation 

4136880 

126709 

End  RP04  driver  processing 

4136922 

126740 

l-lonor  reschedule  request 

4136990 

126786 

Begin  context  switch 

4137070 

126842 

»»»  Context  switch  to:  LOADER 

4137332 

127019 

End  context  switch 

4139096 

128193 

Honor  reschedule  request 

4139135 

128218 

Begin  context  switch 

4139157 

128233 

»»»  Context  sv/ilch  to;  Monitor  Control  Routine 

4139420 

128410 

End  context  switch 

4140080 

128S46 

Begin  EMT  processing 

4140309 

129000 

S8SSSS  EMT  code:  1  QUEUE  1/0 

4140384 

129049 

Begin  QIO  processing 

4141197 

129578 

Begin  RP04  driver  initiation 

4142166 

130220 

ttt-***  Rc;ad  on  unit  0,  word  count  ■=  932, 
cylinder  nriovement  of  0 

4142172 

130224 

End  RP04  driver  processing 

4142426 

130393  . 

Begin  EMT  processing 

4142656 

130547 

SSSSSS  EMT  code:  41  Waif  for  single  event  flag 

4142865 

130686 

Honor  reschedule  request 

4143010 

130782 

Begin  context  switch 

4143090 

130838 

»»»  Context  switch  to:  LV4 

4143360 

131015 

End  context  switch 

6' 


Appendix  f  :  The  Device  Utilteation  Experiment 

This  appendix  presents  the  output  of  our  device  activity  analysis  program.  The 
hardware  monitor  was  used  to  monitor  all  the  'wrile's  into  any  of  the  device  registers 
on  the  PDP-11  Unibiis.  Even  though  only  two  units  of  a  device  were  active  during  the 
data  collection,  the  program  is  capable  of  analysing  the  activities  of  all  the  units  of  all 
the  devices  present  on  the  host  system.  Since  PDP-11  architecture  docs  not  employ 
channels  for  1/0,  there  is  no  dir(?ct  analogue  for  the  quantity  'CPU  and  Channel  Busy’ 
which  is  a  very  popular  measure  of  component  overlap  in  other  systems.  We  have 
defined  another  measure  for  PDP-1 1’s  which  can  be  used  to  delermine  the  overlap 
between  1/0  activity  and  processor  activity.  We  count  the  number  of  cycles  initiated 
by  the  processor  from  the  time  a  disk  is  given  an  I/O  command  ('GO’  bit  is  set  in  the 
controller)  until  the  completion  interrupt  is  received  from  Ihe  device. 

It  should  be  noted  that  due  fo  limifations  in  the  oulpirt  bandv/idfh  of  K.mon,  we  could 
not  get  a  continuous  trace  of  device  acftvily.  Some  small  discrepancies  are  therefore 
present  in  the  data  reported  hero  c.g.for  unit  1  of  the  disk  Ihe  number  of  write  check 
transfers  is  larger  than  the  number  of  write  transfers.  This  probably  arises  because  a 
write  operation  was  missed  while  the  hardware  monitor  was  recovering  from  an 
overflow  but  the  following  write  check  operation  was  successfully  traced. 


Total  time  of  measurement  :  46630.576  mitlisec 
Total  cycles  on  the  unibus:  4890778 
Total  processor  cycles  :  4351877 
Total  device  cycles  in  actual  transfers:  427176 
Overlapped  cycles  between  CPU  and  PPll:  431103 

DISK  ACTIVITY  OISTRlSUflOW  FOP  RPl  1 
Total  number  of  transfers  -  140 
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